Using pycorese#
This notebook demonstrates how to use the pycorese package:
to load knowledge graph
to perform a SPARQL query
to validate a SHACL form
to access the classes of Corese Java API
Install pycorese#
Java Runtime Environment (JRE) 11 or higher is required to run pycorese.
If you don’t have Java installed please refer to the official website to download and install it.
!java -version
openjdk version "11.0.25" 2024-10-15
OpenJDK Runtime Environment (build 11.0.25+9-post-Ubuntu-1ubuntu122.04)
OpenJDK 64-Bit Server VM (build 11.0.25+9-post-Ubuntu-1ubuntu122.04, mixed mode, sharing)
pycorese is available on PyPI and can be installed using pip:
!pip install pycorese
Download the data files from the GitHub repository:
import os
import sys
if not os.path.exists('./data/beatles.rdf'):
print('Downloading the data files...')
!mkdir -p ./data
!wget https://raw.githubusercontent.com/corese-stack/corese-python/main/examples/data/beatles.rdf -O ./data/beatles.rdf
!wget https://raw.githubusercontent.com/corese-stack/corese-python/main/examples/data/beatles-validator.ttl -O ./data/beatles-validator.ttl
if sys.platform == 'win32':
!dir /b .\data\*.*
else:
!ls ./data
beatles.rdf beatles-validator.ttl
Connect to Corese API#
Demonstrate loading and querying data with CoreseAPI connected through Py4J
or JPype
packages. If you don’t specify the java bridge type, the default is Py4J
.
#%%timeit -n 1 -r 1
from pycorese.api import CoreseAPI
python_to_java_bridge = 'py4j'
corese = CoreseAPI(java_bridge=python_to_java_bridge)
corese.loadCorese()
High-level API#
Run SELECT query#
import os
data_path = os.path.abspath('./data/beatles.rdf')
query = '''
SELECT *
WHERE {?subject ?p ?o} LIMIT 5'''
graph = corese.loadRDF(data_path)
results = corese.sparqlSelect(graph, query=query, return_dataframe=True)
results
subject | p | o | |
---|---|---|---|
0 | http://example.com/Please_Please_Me | http://example.com/artist | http://example.com/The_Beatles |
1 | http://example.com/McCartney | http://example.com/artist | http://example.com/Paul_McCartney |
2 | http://example.com/Imagine | http://example.com/artist | http://example.com/John_Lennon |
3 | http://example.com/Please_Please_Me | http://example.com/date | 1963-03-22 |
4 | http://example.com/McCartney | http://example.com/date | 1970-04-17 |
Load inference rules#
corese.resetRuleEngine(graph)
query = "select * where {?s a ?type} order by ?type"
print(corese.sparqlSelect(graph, query=query))
print("Graph size: ", graph.graphSize())
s type
0 http://example.com/Please_Please_Me http://example.com/Album
1 http://example.com/McCartney http://example.com/Album
2 http://example.com/Imagine http://example.com/Album
3 http://example.com/The_Beatles http://example.com/Band
4 http://example.com/John_Lennon http://example.com/SoloArtist
5 http://example.com/Paul_McCartney http://example.com/SoloArtist
6 http://example.com/Ringo_Starr http://example.com/SoloArtist
7 http://example.com/George_Harrison http://example.com/SoloArtist
8 http://example.com/Love_Me_Do http://example.com/Song
Graph size: 29
Adding inference rules to the Corese engine should change the results of the query by adding new triples.
corese.loadRuleEngine(graph, profile=corese.RuleEngine.Profile.RDFS)
print("Graph size: ", graph.graphSize())
Graph size: 33
Let’s see what was added.
query = "select * where {?s a ?type} order by ?type"
print(corese.sparqlSelect(graph, query=query))
print("Graph size: ", graph.graphSize())
s type
0 http://example.com/Please_Please_Me http://example.com/Album
1 http://example.com/McCartney http://example.com/Album
2 http://example.com/Imagine http://example.com/Album
3 http://example.com/The_Beatles http://example.com/Band
4 http://example.com/John_Lennon http://example.com/Person
5 http://example.com/Paul_McCartney http://example.com/Person
6 http://example.com/Ringo_Starr http://example.com/Person
7 http://example.com/George_Harrison http://example.com/Person
8 http://example.com/John_Lennon http://example.com/SoloArtist
9 http://example.com/Paul_McCartney http://example.com/SoloArtist
10 http://example.com/Ringo_Starr http://example.com/SoloArtist
11 http://example.com/George_Harrison http://example.com/SoloArtist
12 http://example.com/Love_Me_Do http://example.com/Song
Graph size: 33
The inference was that the solo artist is also a person although it was not explicitly stated in the data.
Run CONSTRUCT query#
prefixes = '@prefix ex: <http://example.com/>'
contruct = '''CONSTRUCT {?A_Beatle a ex:BandMember }
WHERE { ex:The_Beatles ex:member ?A_Beatle}'''
results = corese.sparqlConstruct(graph, prefixes=prefixes, query=contruct)
print(results)
<?xml version="1.0"?>
<rdf:RDF
xmlns:ex='http://example.com/'
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<ex:BandMember rdf:about='http://example.com/Ringo_Starr'>
</ex:BandMember>
<ex:BandMember rdf:about='http://example.com/John_Lennon'>
</ex:BandMember>
<ex:BandMember rdf:about='http://example.com/George_Harrison'>
</ex:BandMember>
<ex:BandMember rdf:about='http://example.com/Paul_McCartney'>
</ex:BandMember>
</rdf:RDF>
By default, the CONSTRUCT query returns the RDF/XML format. For more concise format convert the results to Turtle.
ttl = corese.toTurtle(results)
print(ttl)
<http://example.com/George_Harrison> a <http://example.com/BandMember> .
<http://example.com/John_Lennon> a <http://example.com/BandMember> .
<http://example.com/Paul_McCartney> a <http://example.com/BandMember> .
<http://example.com/Ringo_Starr> a <http://example.com/BandMember> .
Run SHACL form validation#
In the example below, we will use the the SHACL shape file that validates that the beatles graph follows the rules:
A band has a name and at least on member who is also a Solo Artist
An album has one name, one date and one artist associated with it
A song has one name, one duration and at least writer and at least one performer associated with it
The validation should fail because the beatles graph does not contain the required information.
data_shape_path = os.path.abspath('./data/beatles-validator.ttl')
with open(data_shape_path, 'r') as file:
data_shape = file.read()
print(data_shape)
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX ex: <http://example.com/>
# Shape for Bands
ex:BandShape a sh:NodeShape ;
sh:targetClass ex:Band ;
sh:property [
sh:path ex:name ;
sh:datatype xsd:string ;
sh:minCount 1 ;
sh:maxCount 1 ;
] ;
sh:property [
sh:path ex:member ;
sh:class ex:SoloArtist ;
sh:minCount 1 ;
] .
# Shape for Solo Artists
ex:SoloArtistShape a sh:NodeShape ;
sh:targetClass ex:SoloArtist .
# Shape for Albums
ex:AlbumShape a sh:NodeShape ;
sh:targetClass ex:Album ;
sh:property [
sh:path ex:name ;
sh:datatype xsd:string ;
sh:minCount 1 ;
sh:maxCount 1 ;
] ;
sh:property [
sh:path ex:date ;
sh:datatype xsd:date ;
sh:minCount 1 ;
sh:maxCount 1 ;
] ;
sh:property [
sh:path ex:artist ;
sh:nodeKind sh:IRI ;
sh:minCount 1 ;
sh:maxCount 1 ;
] .
# Shape for Songs
ex:SongShape a sh:NodeShape ;
sh:targetClass ex:Song ;
sh:property [
sh:path ex:name ;
sh:datatype xsd:string ;
sh:minCount 1 ;
sh:maxCount 1 ;
] ;
sh:property [
sh:path ex:length ;
sh:datatype xsd:integer ;
sh:minCount 1 ;
sh:maxCount 1 ;
] ;
sh:property [
sh:path ex:performer ;
sh:nodeKind sh:IRI ;
sh:minCount 1 ;
] ;
sh:property [
sh:path ex:writer ;
sh:nodeKind sh:IRI ;
sh:minCount 1 ;
] .
prefixes = '@prefix ex: <http://example.com/>'
report = corese.shaclValidate(graph, shacl_shape_ttl=data_shape_path, prefixes=prefixes)
print(report)
@prefix xsh: <http://www.w3.org/ns/shacl#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
<urn:uuid:66d7b5ea-0065-4f84-b0e4-d65ba0b16a11> a sh:ValidationResult ;
sh:focusNode <http://example.com/Love_Me_Do> ;
sh:resultMessage "Fail at: [sh:minCount 1 ;\n sh:nodeKind sh:IRI ;\n sh:path <http://example.com/performer>]" ;
sh:resultPath <http://example.com/performer> ;
sh:resultSeverity sh:Violation ;
sh:sourceConstraintComponent sh:MinCountConstraintComponent ;
sh:sourceShape _:b7 ;
sh:value 0 .
[a sh:ValidationReport ;
sh:conforms false ;
sh:result <urn:uuid:66d7b5ea-0065-4f84-b0e4-d65ba0b16a11>] .
The SHACL validation report is verbose and can be reshaped into a DataFrame for readability.
report_dataframe = corese.shaclReportToDataFrame(report)
report_dataframe
type | focusNode | resultMessage | resultPath | resultSeverity | sourceConstraintComponent | sourceShape | value | |
---|---|---|---|---|---|---|---|---|
o | ||||||||
urn:uuid:66d7b5ea-0065-4f84-b0e4-d65ba0b16a11 | http://www.w3.org/ns/shacl#ValidationResult | http://example.com/Love_Me_Do | Fail at: [sh:minCount 1 ; sh:nodeKind sh:IRI... | http://example.com/performer | http://www.w3.org/ns/shacl#Violation | http://www.w3.org/ns/shacl#MinCountConstraintC... | _:b9 | 0 |
The report tells us that for the song Love Me Do a performer is not specified.
Low-level API#
Adding triples manually to the graph.#
# Namespace
ex = "http://example.com/"
# Get the graph from either Graph or DataManager objects
graph = graph.getGraph()
# Create and add statements: Help! is an album
new_album_IRI = graph.addResource(ex + "Help")
rdf_Type_Property = graph.addProperty(corese.Namespaces.RDF + 'type')
album_type_IRI = graph.addResource(ex + "Album")
graph.addEdge(new_album_IRI, rdf_Type_Property, album_type_IRI)
JavaObject id=o37
Let’s see what was added.
query = f'''@prefix ex: <{ex}>
SELECT *
where {{?album a ex:Album }}'''
exec = corese.QueryProcess.create(graph)
results = exec.query(query)
print(results)
01 ?album = <http://example.com/Please_Please_Me>;
02 ?album = <http://example.com/McCartney>;
03 ?album = <http://example.com/Imagine>;
04 ?album = <http://example.com/Help>;
The new triple (album Help) was added to the graph.
Wer can add some more detailes for the album Help! and see what was added.
# Create and add statement: The name of the album is actually Help!
name_property_IRI = graph.addProperty(ex + "name")
name_literal = graph.addLiteral("Help!")
graph.addEdge(new_album_IRI, name_property_IRI, name_literal)
# Create and add statement: The new album was released in 1965
xsd = "http://www.w3.org/2001/XMLSchema#"
release_property_IRI = graph.addProperty(ex + "date")
release_literal = graph.addLiteral("1965", xsd + 'date')
graph.addEdge(new_album_IRI, release_property_IRI, release_literal)
# Create and add statement: The Beatles is the creator of the album Help
artist_property_IRI = graph.addProperty(ex + "artist")
artist_IRI = graph.addLiteral(ex + "The_Beatles")
graph.addEdge(new_album_IRI, artist_property_IRI, artist_IRI)
JavaObject id=o46
query = f'''@prefix ex: <{ex}>
CONSTRUCT {{ ?album ?p ?o }}
WHERE {{
VALUES ?album {{ ex:Help }}
?album ?p ?o}} '''
exec = corese.QueryProcess.create(graph)
results = exec.query(query)
results_ttl = corese.ResultFormat.create(results, corese.ResultFormat.TURTLE_FORMAT)
print(results_ttl)
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ex: <http://example.com/> .
ex:Help ex:artist "http://example.com/The_Beatles" ;
ex:date "1965"^^xsd:date ;
ex:name "Help!" ;
a ex:Album .