PyWebAnno
Introduction
PyWebAnno is a command-line tool that is like a for-loop for WebAnno: it can perform a number of the operations you would normally do using the GUI but for many items at once. So, given a list of users and projects, it could modify the permissions of all of these users in these projects or assign all of these users to a project in the list. PyWebAnno does this either via the semi-official remote API or the definitely unofficial and frowned-upon means of modifying the database directly.
If you are using WebAnno for an annotation project with many users or in an educational setting, it may become necessary to have many WebAnno projects or create a large number of user accounts. Though these are tasks that are quickly accomplished with the GUI for individual projects, they can become tedious and time-consuming if you need to, say, create 50 new user accounts or 20 projects that are configured in a specific way. Furthermore, doing this by hand can lead to errors and inconsistencies and perhaps your new projects won't all have the same settings. Even figuring out which projects need correcting can become prohibitively difficult in such a scenario. It is in such situations where PyWebAnno might be useful.
Installation
Installation using pip
:
pip install [--user] git+https://git.noc.rub.de/ajroussel/pywebanno
Preliminaries
Permissions
Before you can start, you need (1) a WebAnno user account that is allowed to use WebAnno's remote API and (2) a MySQL user that is allowed to read and write the WebAnno database.
Caution!
Note that the high level of permissions required means that, when using PyWebAnno, you very much can break a lot of things quickly, so be careful.
In order to be able to use WebAnno's remote API, a user must have the role
ROLE_REMOTE
. This user will also need to have the role ROLE_ADMIN
in
order to actually enact the various operations PyWebAnno is capable of
performing. Both of these roles can be assigned by an administrator on
WebAnno's "Users" page.
Global Options
All of the commands and subcommands support a --help
option, with which you
can get a summary of what arguments and options they support.
-v, --verbosity | |
Print more (+1) or less (-1) stuff to console. | |
-n, --dry-run | Don't actually do anything. |
-u, --user TEXT | |
Username for WebAnno | |
-p, --password TEXT | |
Webanno password | |
--db-user TEXT | DB username |
--db-password TEXT | |
DB root password | |
-f, --config-file TEXT | |
Read configuration from this file | |
--version | Show the version and exit. |
Probably the most important of these global options are --dry-run
and
--config-file
. Since PyWebAnno does change the state of your WebAnno
instance, you'll want to be sure that the command as executed will make the
changes you actually want. When you use the --dry-run
option, as much of
the requested operation as possible is done without actually changing anything
for real. The changes that would be made will be listed in the program output
on the terminal.
In general, whenever PyWebAnno requires user credentials, it will ask for them. But because it will become very tedious to provide these both for WebAnno itself and the database every time you run any operation, PyWebAnno allows you to supply this information in a couple different ways.
It will first try to read this information from the settings file, if one is provided. If the required information is not found there or no file was provided, then it will use the information provided on the command line. Finally, if some credential is still missing after both of these steps, it will ask.
Settings Files
The settings files for PyWebAnno use the INI file format. There are four
sections in these files, which group related settings, explained in more detail
below. Also, an example settings file is provided in defaults.ini
in this
package's root directory.
Note that while the settings file can contain passwords, it is probably not a good idea to do this in most cases. It is recommended to still enter passwords interactively, even when the other settings are provided from a settings file. If you really want to avoid this, you could in theory pipe the password in on the command line:
pywebanno -p <(pass webanno) project list
The [general]
Section
output_encoding
- Encoding to use for information printed to the terminal.
The [webanno]
Section
user
- Username for the account PyWebAnno should use. This is the user
that was granted
ROLE_REMOTE
previously. password
- Password for the account specified by
user
. url
- Location of the WebAnno instance to connect to.
The [database]
Section
user
- Database user that will perform operations on the database.
url
- URL of the WebAnno database server. (default:
localhost
) port
- Port that the WebAnno database server uses. (default:
3306
) password
- Password for the account specified by
user
. dbname
- Name for the database that WebAnno uses.
The [notification]
Section
from_address
- The email address from which notifications will be sent.
smtp_user
- The username required to login to the SMTP server.
smtp_password
- Password for the account specified by
smtp_user
. smtp_url
- Location of the SMTP server that will send the emails.
Userlists and Projectlists
Many, if not most, of the commands provided by PyWebAnno operate on userlists and/or projectlists. These are JSON files with a particular format.
Userlist
A userlist for the purposes of PyWebAnno is a file that contains exactly one
JSON array, which consists of a number of JSON objects. Each of these objects
must at least contain the key "username"
whose value is a unique string.
Each of the objects requires "password"
in order for the user to be imported
into the database, and in order to notify users of their passwords you will also
require "email"
, but these keys are not required for the other operations. A
userlist might look like this:
[{"username": "newuser1",
"password": "pdq4987d",
"name": "Vorname Nachname",
"email": "newuser1@domain.org"}, ...
In theory any additional keys you find useful to include in the userlist may be included. Such additional keys could come in handy for email notification templates, for example, but they will be ignored otherwise and shouldn't cause any issues when using the userlist with other commands.
Projectlist
A projectlist, like a userlist, contains exactly one JSON array, which
consists of a number of JSON objects. In a projectlist, only the "id"
key
is required, though the "name"
key is usually also present, and a
"template"
key is often also useful for figuring what sort of a project a
given project is, if it was imported from a template. A projectlist might look
like this:
[{"id": 203,
"name": "namedentities_proj_1"},
{"id": 204,
"name": "namedentities_proj_2"}, ...
Usage
The following sections describe the main commands this program provides, which correspond to the type of WebAnno entity they modify.
Projects
Whenever PyWebAnno creates new projects, it will also produce a JSON list of the
generated projects (a projectlist), which can be used to perform various
other tasks for these projects later on. Though you can have a similar list
generated using the list
command and filtering by a name prefix, it's
generally a good idea to hold on to this list, either using the -o
/
--outputfile
option or routing stdout
to a file, to more easily manage
the new projects.
Subcommands
create NUMPROJECTS
- Generate NUMPROJECTS new projects. Projects can either be created from
scratch (using all of the WebAnno default settings) or using a template
(provided via the
--template
option) to configure various settings that you'd like to be the same for all of the new projects. Once the new projects have been created, a projectlist containing information about the newly created projects will be printed onstdout
, or if you provide a filepath with-o
,--outputfile
, written to the given path. By default new projects will be named"project"
plus some integer, but you can replace the"project"
prefix with one of your choosing with the--prefix
option. You can choose a starting integer other than 1 using the--startid
option. list
- Show existing projects on the server. By default this will show all of the
existing projects, but you can limit which projects will be shown by
specifying a prefix (
--prefix
). The projects will be listed in a tabular format or a JSON format (--json
) equivalent to a projectlist. rollback PROJECTLIST
- Remove all of the projects in PROJECTLIST from a WebAnno server.
Examples
Delete old users and projects
This example illustrates how you might clean up at the conclusion of an
annotation project. The rollback
functions can be used to delete all of the
users or projects listed in the files given. When you first create a set of
WebAnno projects and users, each of those operations results in a JSON list,
which contains useful information about the newly created entities. Among other
things, these lists are handy for making changes to a particular set of entities
later. Here, we supply a JSON-format list to the project
and user
subcommands in order to delete the projects and users contained in the lists:
pywebanno project rollback projectslist.json
pywebanno user rollback userlist.json
Generate new projects
This command will create 21 new projects based on the template template.zip
.
The new projects will be numbered sequentially and the names will be prefixed
with "group"
: "group1"
, "group2"
, etc.:
pywebanno project create --prefix group --template template.zip 21
Users
Users can managed via direct manipulation of the Webanno database. Yes, this is as dangerous as it sounds. These functions must be run on a machine that can reach the database.
Subcommands
assign USERLIST PROJECTLIST
- Assign users to projects. Users will be distributed as evenly as possible across the given projects in a randomized order.
create
- Create new user accounts from scratch. Usernames for the new users will
consist of the concatenation of a prefix (
--prefix
), a separator (--sep
), and an integer, selected sequentially starting with 1, or optionally with the value of--startid
. New accounts are enabled by default, but you can specify whether this should be the case using the--enabled
and--disabled
options. By default, information about the new user accounts will be output onstdout
, but if you prefer, you may specify a file where this output should be written with the-o
,--outputfile
option. import
- Create new user accounts from the provided USERLIST.
disable USERLIST
- Disable all of the user accounts in the provided USERLIST.
enable USERLIST
- The opposite of
disable
. list
- Show users currently in the WebAnno database. With one of a few options,
you can choose which users are shown. Using the
--projectlist
(or-p
) you can select all of the users that belong to one of the projects in the given projectlist. Alternatively, you can provide the relevant projects individually using the-i
or--include-project
option one or more times. Which users are shown can also be limited using a permissions level (-l
,--level
), or a string prefix for the usernames (--prefix
). If you just want to see all of the users currently on the WebAnno server, you can use the--all
option. The--json
option will produce JSON-format output that can be used as a userlist. notify USERLIST TEMPLATE
-
Send emails containing the information in TEMPLATE to the users in USERLIST. Optionally you can provide a value for the FROM field of the emails to be sent using the
--from-address
option. Though this can be provided here, it's probably more convenient in most cases to use a settings file.The template should be a plain text file, which will become the body of the message to be sent. It may contain Python formatting directives and information in the USERLIST will be passed to the
str.format()
function as given. It should at least contain the"email"
key, to which address the message will be sent. But beyond this any other information may be incorporated into the message content in this way. set-perms [USERNAMES]...
- Modify project role for the given users. These can either be given as
arguments on the command line or in a provided userlist (
-u
,--userlist
) or projectlist (-p
,--projectlist
). When a userlist is provided, the permissions will be set for the listed users, and when a projectlist is provided, the permissions will be set for all the users in any one of the projects listed. With--add
you can specify which of the three roles,user
,manager
, orcurator
, should be assigned to the given set of users. And with--rm
you can remove the given roles.
Examples
Generate user accounts
With the import
command, you can add users to your WebAnno instance from a
pre-existing USERLIST. However, it may be the case that the user data you have
is not already in the JSON format that PyWebAnno requires. If the data is in a
tabular format, such as CSV exported from a spreadsheet application, then you
can use csvkit and jq to perform this conversion with relative ease and in
a way that is adaptable to your particular requirements. This example shows
what such a conversion step could look like. Here the CSV file contains a
header, which provides the keys for the output JSON. However, since the items
in the header are uppercase and PyWebAnno requires lowercase keys, this
conversion includes a downcase transformation. The -I
option of csvjson
(from csvkit
) ensures that leading zeroes in ID numbers will be preserved
(it disables type conversion, causing all values to be interpreted as strings),
and there is a also a substitution step that removes from the hyphen from
"e-mail"
:
csvjson -I userlist_syntax_wise2021.csv |
jq 'map( with_entries( .key |= (ascii_downcase | sub("-";""))))' > userlist.json
pywebanno user import userlist.json
Add users to projects
The users in userlist.json
will be randomly distributed among the
projects listed in projectslist.json
:
pywebanno user assign userlist.json projectslist.json
Send account information to new users
Once you've created some new user accounts, you'll need to communicate to the
users the information they need to log in. This can be done with the notify
command:
pywebanno user notify userlist.json account-info.txt
This command requires the user information generated by the create
command,
which can either be passed to stdin
or read from a file.
The user notify
function is also useful for notifying users about system
downtime or supplying information about the annotation tasks, among other
things.
Documents
Subcommands
add-annotations PROJECTLIST DOCUMENTDIR USERNAME
- DOCUMENTDIR is a folder that contains one file for each document contained
in some project in PROJECTLIST. This command will go through the documents
in the given projects and select the file in DOCUMENTDIR whose name is the
same as the document (apart from the extension, which is likely to be
different in the annotated version and is therefore ignored). The
additional annotations present in this document, which are not present in
the original document, are uploaded as the annotations of USERNAME.
Optionally, you can only upload annotations for a selection of the documents
using the
--only
option. Only documents whose name begins with this prefix will have new annotations uploaded. delete-annotations PROJECTLIST USERNAME
- Delete the annotations by USERNAME for all projects in PROJECTLIST. As
with
add-annotations
, here too you can control which documents have annotations removed using the--only
option. import DOCUMENTDIR PROJECTLIST
-
Add the documents in DOCUMENTDIR to the projects in PROJECTLIST. With the
--method
option, you can select how this should be done: all will add all of the documents in DOCUMENTDIR to each of the projects. zip (the default) will add each document in alphabetical order to one project. shufflezip first shuffles the list of projects, thus adding each document to a random project from PROJECTLIST. The--format
option specifies the format of the documents in DOCUMENTDIR.Note
If you plan to add annotations to these documents later on, it is best to also upload the unannotated base texts in the same format you plan to use for the annotations. In my experience, this is the best way to avoid issues with character offsets, for which WebAnno seems to have very strict requirements.
list
- Show documents. As with
user list
, you select which documents are to be shown by three means: providing project IDs individually for the project(s) whose documents should be listed (-i
,--include-project
), providing a projectlist containing the projects whose documents should be listed (-p
,--projectlist
), or specifying that all documents currently on the server should be shown (-a
,--all
). When the--json
option is present, the output will be in JSON format. remove DOCNAME PROJECTLIST
- Remove all documents whose names contain DOCNAME from all projects in
PROJECTLIST. If the
--exact
option is provided, only delete documents whose names are exactly equal to DOCNAME.
Examples
Show which documents are in certain projects
You can see which documents are currently online in a group of related projects:
pywebanno document list --projectlist pos_anno_projs.json
Add automatic generated annotation for use during curation
If you need to have annotations visible to curators but not to annotators, in
order to assist them in making corrections, for instance, you can accomplish
this with the add-annotations
command:
pywebanno document add-annotations pos_anno_projs.json tagged_files/ autotagger
Layers
In general, annotation layers should be configured as a part of the construction of the project template, so that the layers will be correct and identical across several projects. However, this is not always possible and sometimes you need to make adjustments once the projects are already active. In that situation you might need one of these functions.
Subcommands
enable LAYERNAME PROJECTLIST
- Enable the layer with the name LAYERNAME for all projects in PROJECTLIST. Note that LAYERNAME here corresponds to what is called "Internal Name" in the WebAnno GUI. We use this name, because it allows the layer to be selected unambiguously.
disable LAYERNAME PROJECTLIST
- The opposite of
enable
. rename LAYERNAME NEWNAME PROJECTLIST
- Set the UI name of LAYERNAME to NEWNAME for all projects in PROJECTLIST. The UI name is called simply "Name" in the WebAnno GUI and is the name that users/annotators will see.
Examples
Renaming an annotation layer
This invocation will rename the built-in POS layer to "STTS"
:
pywebanno layer rename de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS STTS projslist.json
Disabling an annotation layer
The following invocation would disable the built-in layer for named entity annotations:
pywebanno layer disable de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity projslist.json