Pipeline Inputs
To start the pipeline, it needs the path to the text file, or the directory containing the text files, as well as a name for the project.
pipeline(data_path=text_path)
By default, the pipeline creates an output directory in the working directory my_project
and stores all pipeline results in it.
Other names can be given through project_name
parameter.
pipeline(data_path=text_path, project_name='travel_records')
By default, the pipeline uses the basis ontology nlpGraph_onto.ttl
to generate the graph and the SHACL shapes given
in nlpGraph_shacl.ttl
to validate the constructed graph.
The user can provide a custom ontology and its corresponding SHACL shapes as inputs to the pipeline serialized in Turtle format. The path of the ontology and SHACL shapes turtle files must be given as the input parameters when starting the pipeline.
pipeline(data_path=text_path,
ontology_path=ontology_path,
shacl_path=shacl_path,
project_name=project_name)
For examples of custom ontology and custom SHACL shapes graph, see files example_onto.ttl
and example_shacl.ttl
in
inputs
directory of the code base.
What can custom ontologies contains?
- Namespaces and prefixes that must be used to create the resources. For example, you can set the prefix
ex
for your project:@prefix ex: <http://www.example.com/ontology/> . @prefix : <http://www.example.com/resource/> .
The resources will be created with IRIs having the default namespace as its main part.
- If you wish more datatype properties to be retrieved from wikidata for a person or a location,
you must add that property to your ontology specifying it as sub-property of the correspodning wikidata property, as
shown in the example below. In this way, you can tell the pipeline to retrieve information about the birthdate of a person from Wikidata and add this
information to the node of the graph representing the person. Note that the property should have
owl:DatatypeProperty
type!
@prefix wdt: <http://www.wikidata.org/prop/direct/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix nlpg: <http://www.NLPGraph.com/ontology/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
ex:birthDate rdf:type owl:DatatypeProperty ;
rdfs:subPropertyOf wdt:P569 ;
rdfs:label "date of birth"@en ;
rdfs:comment "date of birth of a person."@en ;
rdfs:domain nlpg:Person ;
rdfs:range xsd:date .
- The relations that must be extracted during the second stage of the pipeline through dependency parsing and POS tagging,
should be given as
owl:ObjectProperty
withrdfs:range
andrdfs:domain
which arenlpg:Person
ornlpg:Location
. For example, suppose we want the pipeline to find all locations a person traveled to from the input text, we can define a new property describing this relation between a resource of typenlpg:Person
and resources of typenlpg:Location
, as shown below. Through this property definition, the pipeline will attempt to parse out from the text all the information about places the persons mentioned in the text traveled to. The locations will be stored asnlpg:Location
resource, and an edge will be created connecting the corresponding person and location resources.
ex:traveledTo rdf:type owl:ObjectProperty ;
rdfs:label "Traveled to"@en ;
rdfs:comment "Trip destination."@en ;
rdfs:domain nlpg:Person ;
rdfs:range nlpg:Location .
What can custom SHACL shapes contain?
- Property shapes to check the datatype properties given in the ontology. In this way, the pipeline can validate the information
retrieved from Wikidata. In the example below, a property shape is given to be used to validate the birthdate retrieved from
Wikidata for
nlpg:Person
resources. This shape must be given in a turtle file containing all shapes that must be used to validate the properties given in the custom ontology.
ex:BirthDateShape a sh:PropertyShape ;
sh:path ex:birthDate ;
sh:message "We need at least one birth date value for a person" ;
sh:severity sh:Violation ;
sh:minCount 1 ;
sh:datatype xsd:date ;
sh:targetClass nlpg:Person.
-
The cardinality requirements for a property must be given using the
sh:maxCount
andsh:minCount
predicates defined in SHACL vocabulary. For optional properties, no cardinality restriction is required in the shapes graph. However, for the non-optional properties, such asex:birthDate
above, a minimum cardinality value must be given in the shapes graph throughsh:minCount
predicate. The property shapeex:BirthDateShape
states that eachnlpg:Person
resource in the output data graph, must at least have one birthdate given throughex:birthdate
predicate. That means, the pipeline must retrieve at least one birthdate from Wikidata for each Person entity. -
Property shapes to validate the object properties that represent relation between named entities. We can set different levels of severity of validation failure in case data is not consistent with the shape; for example in case the data is not consistent with this shape a warning is raised with the given custom message.
ex:TraveledToShape a sh:PropertyShape ;
sh:path ex:traveledTo ;
sh:message "No travel destinations found for the person." ;
sh:severity sh:Warning ;
sh:class nlpg:Location ;
sh:targetClass nlpg:Person.