Using pycorese#

This notebook demonstrates how to use the pycorese package:

to load knowledge graph
to perform a SPARQL query
to validate a SHACL form
to access the classes of Corese Java API

Install pycorese#

Java Runtime Environment (JRE) 11 or higher is required to run pycorese.

If you don’t have Java installed please refer to the official website to download and install it.

!java -version

openjdk version "11.0.25" 2024-10-15
OpenJDK Runtime Environment (build 11.0.25+9-post-Ubuntu-1ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.25+9-post-Ubuntu-1ubuntu122.04, mixed mode, sharing)

pycorese is available on PyPI and can be installed using pip:

!pip install pycorese

Download the data files from the GitHub repository:

import os
import sys
if  not os.path.exists('./data/beatles.rdf'):
    print('Downloading the data files...')
    !mkdir -p ./data
    !wget https://raw.githubusercontent.com/corese-stack/corese-python/main/examples/data/beatles.rdf -O ./data/beatles.rdf
    !wget https://raw.githubusercontent.com/corese-stack/corese-python/main/examples/data/beatles-validator.ttl -O ./data/beatles-validator.ttl

if sys.platform == 'win32':
    !dir /b .\data\*.*
else:
    !ls ./data

beatles.rdf  beatles-validator.ttl

Connect to Corese API#

Demonstrate loading and querying data with CoreseAPI connected through Py4J or JPype packages. If you don’t specify the java bridge type, the default is Py4J.

#%%timeit -n 1 -r 1
from  pycorese.api import CoreseAPI

python_to_java_bridge = 'py4j'
corese = CoreseAPI(java_bridge=python_to_java_bridge)
corese.loadCorese()

High-level API#

Run SELECT query#

import os
data_path = os.path.abspath('./data/beatles.rdf')

query = '''
SELECT *
WHERE {?subject ?p ?o} LIMIT 5'''

graph = corese.loadRDF(data_path)
results = corese.sparqlSelect(graph, query=query, return_dataframe=True)

results

	subject	p	o
0	http://example.com/Please_Please_Me	http://example.com/artist	http://example.com/The_Beatles
1	http://example.com/McCartney	http://example.com/artist	http://example.com/Paul_McCartney
2	http://example.com/Imagine	http://example.com/artist	http://example.com/John_Lennon
3	http://example.com/Please_Please_Me	http://example.com/date	1963-03-22
4	http://example.com/McCartney	http://example.com/date	1970-04-17

Load inference rules#

corese.resetRuleEngine(graph)
query = "select * where {?s a ?type} order by ?type"
print(corese.sparqlSelect(graph, query=query))
print("Graph size: ", graph.graphSize())

                                     s                           type
http://example.com/Please_Please_Me       http://example.com/Album
       http://example.com/McCartney       http://example.com/Album
         http://example.com/Imagine       http://example.com/Album
     http://example.com/The_Beatles        http://example.com/Band
     http://example.com/John_Lennon  http://example.com/SoloArtist
  http://example.com/Paul_McCartney  http://example.com/SoloArtist
     http://example.com/Ringo_Starr  http://example.com/SoloArtist
 http://example.com/George_Harrison  http://example.com/SoloArtist
      http://example.com/Love_Me_Do        http://example.com/Song
Graph size:  29

Adding inference rules to the Corese engine should change the results of the query by adding new triples.

corese.loadRuleEngine(graph, profile=corese.RuleEngine.Profile.RDFS)
print("Graph size: ", graph.graphSize())

Graph size:  33

Let’s see what was added.

query = "select * where {?s a ?type} order by ?type"
print(corese.sparqlSelect(graph, query=query))
print("Graph size: ", graph.graphSize())

                                      s                           type
 http://example.com/Please_Please_Me       http://example.com/Album
        http://example.com/McCartney       http://example.com/Album
          http://example.com/Imagine       http://example.com/Album
      http://example.com/The_Beatles        http://example.com/Band
      http://example.com/John_Lennon      http://example.com/Person
   http://example.com/Paul_McCartney      http://example.com/Person
      http://example.com/Ringo_Starr      http://example.com/Person
  http://example.com/George_Harrison      http://example.com/Person
      http://example.com/John_Lennon  http://example.com/SoloArtist
   http://example.com/Paul_McCartney  http://example.com/SoloArtist
     http://example.com/Ringo_Starr  http://example.com/SoloArtist
 http://example.com/George_Harrison  http://example.com/SoloArtist
      http://example.com/Love_Me_Do        http://example.com/Song
Graph size:  33

The inference was that the solo artist is also a person although it was not explicitly stated in the data.

Run CONSTRUCT query#

prefixes = '@prefix ex: <http://example.com/>'
contruct = '''CONSTRUCT {?A_Beatle a ex:BandMember }
              WHERE { ex:The_Beatles ex:member ?A_Beatle}'''

results = corese.sparqlConstruct(graph, prefixes=prefixes, query=contruct)

print(results)

<?xml version="1.0"?>
<rdf:RDF
  xmlns:ex='http://example.com/'
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

  <ex:BandMember rdf:about='http://example.com/Ringo_Starr'>
  </ex:BandMember>

  <ex:BandMember rdf:about='http://example.com/John_Lennon'>
  </ex:BandMember>

  <ex:BandMember rdf:about='http://example.com/George_Harrison'>
  </ex:BandMember>

  <ex:BandMember rdf:about='http://example.com/Paul_McCartney'>
  </ex:BandMember>

</rdf:RDF>

By default, the CONSTRUCT query returns the RDF/XML format. For more concise format convert the results to Turtle.

ttl = corese.toTurtle(results)

print(ttl)

<http://example.com/George_Harrison> a <http://example.com/BandMember> .

<http://example.com/John_Lennon> a <http://example.com/BandMember> .

<http://example.com/Paul_McCartney> a <http://example.com/BandMember> .

<http://example.com/Ringo_Starr> a <http://example.com/BandMember> .

Run SHACL form validation#

In the example below, we will use the the SHACL shape file that validates that the beatles graph follows the rules:

A band has a name and at least on member who is also a Solo Artist
An album has one name, one date and one artist associated with it
A song has one name, one duration and at least writer and at least one performer associated with it

The validation should fail because the beatles graph does not contain the required information.

data_shape_path = os.path.abspath('./data/beatles-validator.ttl')

with open(data_shape_path, 'r') as file:
    data_shape = file.read()
    print(data_shape)

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ex: <http://example.com/>

# Shape for Bands
ex:BandShape a sh:NodeShape ;
    sh:targetClass ex:Band ;
    sh:property [
        sh:path ex:name ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
    sh:property [
        sh:path ex:member ;
        sh:class ex:SoloArtist ;
        sh:minCount 1 ;
    ] .

# Shape for Solo Artists
ex:SoloArtistShape a sh:NodeShape ;
    sh:targetClass ex:SoloArtist .

# Shape for Albums
ex:AlbumShape a sh:NodeShape ;
    sh:targetClass ex:Album ;
    sh:property [
        sh:path ex:name ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
    sh:property [
        sh:path ex:date ;
        sh:datatype xsd:date ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
    sh:property [
        sh:path ex:artist ;
        sh:nodeKind sh:IRI ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] .

# Shape for Songs
ex:SongShape a sh:NodeShape ;
    sh:targetClass ex:Song ;
    sh:property [
        sh:path ex:name ;
        sh:datatype xsd:string ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
    sh:property [
        sh:path ex:length ;
        sh:datatype xsd:integer ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
    ] ;
        sh:property [
        sh:path ex:performer ;
        sh:nodeKind sh:IRI ;
        sh:minCount 1 ;
    ] ;
    sh:property [
        sh:path ex:writer ;
        sh:nodeKind sh:IRI ;
        sh:minCount 1 ;
    ] .

prefixes = '@prefix ex: <http://example.com/>'
report = corese.shaclValidate(graph, shacl_shape_ttl=data_shape_path, prefixes=prefixes)

print(report)

@prefix xsh: <http://www.w3.org/ns/shacl#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

<urn:uuid:66d7b5ea-0065-4f84-b0e4-d65ba0b16a11> a sh:ValidationResult ;
  sh:focusNode <http://example.com/Love_Me_Do> ;
  sh:resultMessage "Fail at: [sh:minCount 1 ;\n  sh:nodeKind sh:IRI ;\n  sh:path <http://example.com/performer>]" ;
  sh:resultPath <http://example.com/performer> ;
  sh:resultSeverity sh:Violation ;
  sh:sourceConstraintComponent sh:MinCountConstraintComponent ;
  sh:sourceShape _:b7 ;
  sh:value 0 .

[a sh:ValidationReport ;
  sh:conforms false ;
  sh:result <urn:uuid:66d7b5ea-0065-4f84-b0e4-d65ba0b16a11>] .

The SHACL validation report is verbose and can be reshaped into a DataFrame for readability.

report_dataframe = corese.shaclReportToDataFrame(report)

report_dataframe

	type	focusNode	resultMessage	resultPath	resultSeverity	sourceConstraintComponent	sourceShape	value
o
urn:uuid:66d7b5ea-0065-4f84-b0e4-d65ba0b16a11	http://www.w3.org/ns/shacl#ValidationResult	http://example.com/Love_Me_Do	Fail at: [sh:minCount 1 ; sh:nodeKind sh:IRI...	http://example.com/performer	http://www.w3.org/ns/shacl#Violation	http://www.w3.org/ns/shacl#MinCountConstraintC...	_:b9	0

The report tells us that for the song Love Me Do a performer is not specified.

Low-level API#

Adding triples manually to the graph.#

# Namespace
ex = "http://example.com/"

# Get the graph from either Graph or DataManager objects
graph = graph.getGraph()

# Create and add statements: Help! is an album
new_album_IRI = graph.addResource(ex + "Help")
rdf_Type_Property = graph.addProperty(corese.Namespaces.RDF + 'type')
album_type_IRI = graph.addResource(ex + "Album")

graph.addEdge(new_album_IRI, rdf_Type_Property, album_type_IRI)

JavaObject id=o37

Let’s see what was added.

query = f'''@prefix ex: <{ex}>
            SELECT *
            where {{?album a ex:Album }}'''

exec = corese.QueryProcess.create(graph)

results = exec.query(query)

print(results)

?album = <http://example.com/Please_Please_Me>; 
?album = <http://example.com/McCartney>; 
?album = <http://example.com/Imagine>; 
?album = <http://example.com/Help>; 

The new triple (album Help) was added to the graph.

Wer can add some more detailes for the album Help! and see what was added.

# Create and add statement: The name of the album is actually Help!
name_property_IRI = graph.addProperty(ex + "name")
name_literal = graph.addLiteral("Help!")

graph.addEdge(new_album_IRI, name_property_IRI, name_literal)

# Create and add statement: The new album was released in 1965
xsd = "http://www.w3.org/2001/XMLSchema#"
release_property_IRI = graph.addProperty(ex + "date")
release_literal = graph.addLiteral("1965", xsd + 'date')

graph.addEdge(new_album_IRI, release_property_IRI, release_literal)


# Create and add statement: The Beatles is the creator of the album Help
artist_property_IRI = graph.addProperty(ex + "artist")
artist_IRI = graph.addLiteral(ex + "The_Beatles")
graph.addEdge(new_album_IRI, artist_property_IRI, artist_IRI)

JavaObject id=o46

query = f'''@prefix ex: <{ex}>
            CONSTRUCT {{ ?album ?p ?o }}
            WHERE {{
                VALUES ?album {{ ex:Help }}
                ?album ?p ?o}} '''

exec = corese.QueryProcess.create(graph)

results = exec.query(query)

results_ttl = corese.ResultFormat.create(results, corese.ResultFormat.TURTLE_FORMAT)

print(results_ttl)

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.com/> .

ex:Help ex:artist "http://example.com/The_Beatles" ;
  ex:date "1965"^^xsd:date ;
  ex:name "Help!" ;
  a ex:Album .