Skip to content
Snippets Groups Projects

PyWebAnno

Introduction

PyWebAnno is a command-line tool that is like a for-loop for WebAnno: it can perform a number of the operations you would normally do using the GUI but for many items at once. So, given a list of users and projects, it could modify the permissions of all of these users in these projects or assign all of these users to a project in the list. PyWebAnno does this either via the semi-official remote API or the definitely unofficial and frowned-upon means of modifying the database directly.

If you are using WebAnno for an annotation project with many users or in an educational setting, it may become necessary to have many WebAnno projects or create a large number of user accounts. Though these are tasks that are quickly accomplished with the GUI for individual projects, they can become tedious and time-consuming if you need to, say, create 50 new user accounts or 20 projects that are configured in a specific way. Furthermore, doing this by hand can lead to errors and inconsistencies and perhaps your new projects won't all have the same settings. Even figuring out which projects need correcting can become prohibitively difficult in such a scenario. It is in such situations where PyWebAnno might be useful.

Installation

Installation using pip:

pip install [--user] git+https://git.noc.rub.de/ajroussel/pywebanno

Preliminaries

Permissions

Before you can start, you need (1) a WebAnno user account that is allowed to use WebAnno's remote API and (2) a MySQL user that is allowed to read and write the WebAnno database.

Caution!

Note that the high level of permissions required means that, when using PyWebAnno, you very much can break a lot of things quickly, so be careful.

In order to be able to use WebAnno's remote API, a user must have the role ROLE_REMOTE. This user will also need to have the role ROLE_ADMIN in order to actually enact the various operations PyWebAnno is capable of performing. Both of these roles can be assigned by an administrator on WebAnno's "Users" page.

Global Options

All of the commands and subcommands support a --help option, with which you can get a summary of what arguments and options they support.

-v, --verbosity
  Print more (+1) or less (-1) stuff to console.
-n, --dry-run Don't actually do anything.
-u, --user TEXT
  Username for WebAnno
-p, --password TEXT
  Webanno password
--db-user TEXT DB username
--db-password TEXT
  DB root password
-f, --config-file TEXT
  Read configuration from this file
--version Show the version and exit.

Probably the most important of these global options are --dry-run and --config-file. Since PyWebAnno does change the state of your WebAnno instance, you'll want to be sure that the command as executed will make the changes you actually want. When you use the --dry-run option, as much of the requested operation as possible is done without actually changing anything for real. The changes that would be made will be listed in the program output on the terminal.

In general, whenever PyWebAnno requires user credentials, it will ask for them. But because it will become very tedious to provide these both for WebAnno itself and the database every time you run any operation, PyWebAnno allows you to supply this information in a couple different ways.

It will first try to read this information from the settings file, if one is provided. If the required information is not found there or no file was provided, then it will use the information provided on the command line. Finally, if some credential is still missing after both of these steps, it will ask.

Settings Files

The settings files for PyWebAnno use the INI file format. There are four sections in these files, which group related settings, explained in more detail below. Also, an example settings file is provided in defaults.ini in this package's root directory.

Note that while the settings file can contain passwords, it is probably not a good idea to do this in most cases. It is recommended to still enter passwords interactively, even when the other settings are provided from a settings file. If you really want to avoid this, you could in theory pipe the password in on the command line:

pywebanno -p <(pass webanno) project list

The [general] Section

output_encoding
Encoding to use for information printed to the terminal.

The [webanno] Section

user
Username for the account PyWebAnno should use. This is the user that was granted ROLE_REMOTE previously.
password
Password for the account specified by user.
url
Location of the WebAnno instance to connect to.

The [database] Section

user
Database user that will perform operations on the database.
url
URL of the WebAnno database server. (default: localhost)
port
Port that the WebAnno database server uses. (default: 3306)
password
Password for the account specified by user.
dbname
Name for the database that WebAnno uses.

The [notification] Section

from_address
The email address from which notifications will be sent.
smtp_user
The username required to login to the SMTP server.
smtp_password
Password for the account specified by smtp_user.
smtp_url
Location of the SMTP server that will send the emails.

Userlists and Projectlists

Many, if not most, of the commands provided by PyWebAnno operate on userlists and/or projectlists. These are JSON files with a particular format.

Userlist

A userlist for the purposes of PyWebAnno is a file that contains exactly one JSON array, which consists of a number of JSON objects. Each of these objects must at least contain the key "username" whose value is a unique string. Each of the objects requires "password" in order for the user to be imported into the database, and in order to notify users of their passwords you will also require "email", but these keys are not required for the other operations. A userlist might look like this:

[{"username": "newuser1",
  "password": "pdq4987d",
  "name": "Vorname Nachname",
  "email": "newuser1@domain.org"}, ...

In theory any additional keys you find useful to include in the userlist may be included. Such additional keys could come in handy for email notification templates, for example, but they will be ignored otherwise and shouldn't cause any issues when using the userlist with other commands.

Projectlist

A projectlist, like a userlist, contains exactly one JSON array, which consists of a number of JSON objects. In a projectlist, only the "id" key is required, though the "name" key is usually also present, and a "template" key is often also useful for figuring what sort of a project a given project is, if it was imported from a template. A projectlist might look like this:

[{"id": 203,
  "name": "namedentities_proj_1"},
 {"id": 204,
  "name": "namedentities_proj_2"}, ...

Usage

The following sections describe the main commands this program provides, which correspond to the type of WebAnno entity they modify.

Projects

Whenever PyWebAnno creates new projects, it will also produce a JSON list of the generated projects (a projectlist), which can be used to perform various other tasks for these projects later on. Though you can have a similar list generated using the list command and filtering by a name prefix, it's generally a good idea to hold on to this list, either using the -o / --outputfile option or routing stdout to a file, to more easily manage the new projects.

Subcommands

create NUMPROJECTS
Generate NUMPROJECTS new projects. Projects can either be created from scratch (using all of the WebAnno default settings) or using a template (provided via the --template option) to configure various settings that you'd like to be the same for all of the new projects. Once the new projects have been created, a projectlist containing information about the newly created projects will be printed on stdout, or if you provide a filepath with -o, --outputfile, written to the given path. By default new projects will be named "project" plus some integer, but you can replace the "project" prefix with one of your choosing with the --prefix option. You can choose a starting integer other than 1 using the --startid option.
list
Show existing projects on the server. By default this will show all of the existing projects, but you can limit which projects will be shown by specifying a prefix (--prefix). The projects will be listed in a tabular format or a JSON format (--json) equivalent to a projectlist.
rollback PROJECTLIST
Remove all of the projects in PROJECTLIST from a WebAnno server.

Examples

Delete old users and projects

This example illustrates how you might clean up at the conclusion of an annotation project. The rollback functions can be used to delete all of the users or projects listed in the files given. When you first create a set of WebAnno projects and users, each of those operations results in a JSON list, which contains useful information about the newly created entities. Among other things, these lists are handy for making changes to a particular set of entities later. Here, we supply a JSON-format list to the project and user subcommands in order to delete the projects and users contained in the lists:

pywebanno project rollback projectslist.json
pywebanno user rollback userlist.json
Generate new projects

This command will create 21 new projects based on the template template.zip. The new projects will be numbered sequentially and the names will be prefixed with "group": "group1", "group2", etc.:

pywebanno project create --prefix group --template template.zip 21

Users

Users can managed via direct manipulation of the Webanno database. Yes, this is as dangerous as it sounds. These functions must be run on a machine that can reach the database.

Subcommands

assign USERLIST PROJECTLIST
Assign users to projects. Users will be distributed as evenly as possible across the given projects in a randomized order.
create
Create new user accounts from scratch. Usernames for the new users will consist of the concatenation of a prefix (--prefix), a separator (--sep), and an integer, selected sequentially starting with 1, or optionally with the value of --startid. New accounts are enabled by default, but you can specify whether this should be the case using the --enabled and --disabled options. By default, information about the new user accounts will be output on stdout, but if you prefer, you may specify a file where this output should be written with the -o, --outputfile option.
import
Create new user accounts from the provided USERLIST.
disable USERLIST
Disable all of the user accounts in the provided USERLIST.
enable USERLIST
The opposite of disable.
list
Show users currently in the WebAnno database. With one of a few options, you can choose which users are shown. Using the --projectlist (or -p) you can select all of the users that belong to one of the projects in the given projectlist. Alternatively, you can provide the relevant projects individually using the -i or --include-project option one or more times. Which users are shown can also be limited using a permissions level (-l, --level), or a string prefix for the usernames (--prefix). If you just want to see all of the users currently on the WebAnno server, you can use the --all option. The --json option will produce JSON-format output that can be used as a userlist.
notify USERLIST TEMPLATE

Send emails containing the information in TEMPLATE to the users in USERLIST. Optionally you can provide a value for the FROM field of the emails to be sent using the --from-address option. Though this can be provided here, it's probably more convenient in most cases to use a settings file.

The template should be a plain text file, which will become the body of the message to be sent. It may contain Python formatting directives and information in the USERLIST will be passed to the str.format() function as given. It should at least contain the "email" key, to which address the message will be sent. But beyond this any other information may be incorporated into the message content in this way.

set-perms [USERNAMES]...
Modify project role for the given users. These can either be given as arguments on the command line or in a provided userlist (-u, --userlist) or projectlist (-p, --projectlist). When a userlist is provided, the permissions will be set for the listed users, and when a projectlist is provided, the permissions will be set for all the users in any one of the projects listed. With --add you can specify which of the three roles, user, manager, or curator, should be assigned to the given set of users. And with --rm you can remove the given roles.

Examples

Generate user accounts

With the import command, you can add users to your WebAnno instance from a pre-existing USERLIST. However, it may be the case that the user data you have is not already in the JSON format that PyWebAnno requires. If the data is in a tabular format, such as CSV exported from a spreadsheet application, then you can use csvkit and jq to perform this conversion with relative ease and in a way that is adaptable to your particular requirements. This example shows what such a conversion step could look like. Here the CSV file contains a header, which provides the keys for the output JSON. However, since the items in the header are uppercase and PyWebAnno requires lowercase keys, this conversion includes a downcase transformation. The -I option of csvjson (from csvkit) ensures that leading zeroes in ID numbers will be preserved (it disables type conversion, causing all values to be interpreted as strings), and there is a also a substitution step that removes from the hyphen from "e-mail":

csvjson -I userlist_syntax_wise2021.csv |
jq 'map( with_entries( .key |= (ascii_downcase | sub("-";""))))' > userlist.json
pywebanno user import userlist.json
Add users to projects

The users in userlist.json will be randomly distributed among the projects listed in projectslist.json:

pywebanno user assign userlist.json projectslist.json
Send account information to new users

Once you've created some new user accounts, you'll need to communicate to the users the information they need to log in. This can be done with the notify command:

pywebanno user notify userlist.json account-info.txt

This command requires the user information generated by the create command, which can either be passed to stdin or read from a file.

The user notify function is also useful for notifying users about system downtime or supplying information about the annotation tasks, among other things.

Documents

Subcommands

add-annotations PROJECTLIST DOCUMENTDIR USERNAME
DOCUMENTDIR is a folder that contains one file for each document contained in some project in PROJECTLIST. This command will go through the documents in the given projects and select the file in DOCUMENTDIR whose name is the same as the document (apart from the extension, which is likely to be different in the annotated version and is therefore ignored). The additional annotations present in this document, which are not present in the original document, are uploaded as the annotations of USERNAME. Optionally, you can only upload annotations for a selection of the documents using the --only option. Only documents whose name begins with this prefix will have new annotations uploaded.
delete-annotations PROJECTLIST USERNAME
Delete the annotations by USERNAME for all projects in PROJECTLIST. As with add-annotations, here too you can control which documents have annotations removed using the --only option.
import DOCUMENTDIR PROJECTLIST

Add the documents in DOCUMENTDIR to the projects in PROJECTLIST. With the --method option, you can select how this should be done: all will add all of the documents in DOCUMENTDIR to each of the projects. zip (the default) will add each document in alphabetical order to one project. shufflezip first shuffles the list of projects, thus adding each document to a random project from PROJECTLIST. The --format option specifies the format of the documents in DOCUMENTDIR.

Note

If you plan to add annotations to these documents later on, it is best to also upload the unannotated base texts in the same format you plan to use for the annotations. In my experience, this is the best way to avoid issues with character offsets, for which WebAnno seems to have very strict requirements.

list
Show documents. As with user list, you select which documents are to be shown by three means: providing project IDs individually for the project(s) whose documents should be listed (-i, --include-project), providing a projectlist containing the projects whose documents should be listed (-p, --projectlist), or specifying that all documents currently on the server should be shown (-a, --all). When the --json option is present, the output will be in JSON format.
remove DOCNAME PROJECTLIST
Remove all documents whose names contain DOCNAME from all projects in PROJECTLIST. If the --exact option is provided, only delete documents whose names are exactly equal to DOCNAME.

Examples

Show which documents are in certain projects

You can see which documents are currently online in a group of related projects:

pywebanno document list --projectlist pos_anno_projs.json
Add automatic generated annotation for use during curation

If you need to have annotations visible to curators but not to annotators, in order to assist them in making corrections, for instance, you can accomplish this with the add-annotations command:

pywebanno document add-annotations pos_anno_projs.json tagged_files/ autotagger

Layers

In general, annotation layers should be configured as a part of the construction of the project template, so that the layers will be correct and identical across several projects. However, this is not always possible and sometimes you need to make adjustments once the projects are already active. In that situation you might need one of these functions.

Subcommands

enable LAYERNAME PROJECTLIST
Enable the layer with the name LAYERNAME for all projects in PROJECTLIST. Note that LAYERNAME here corresponds to what is called "Internal Name" in the WebAnno GUI. We use this name, because it allows the layer to be selected unambiguously.
disable LAYERNAME PROJECTLIST
The opposite of enable.
rename LAYERNAME NEWNAME PROJECTLIST
Set the UI name of LAYERNAME to NEWNAME for all projects in PROJECTLIST. The UI name is called simply "Name" in the WebAnno GUI and is the name that users/annotators will see.

Examples

Renaming an annotation layer

This invocation will rename the built-in POS layer to "STTS":

pywebanno layer rename de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.POS STTS projslist.json
Disabling an annotation layer

The following invocation would disable the built-in layer for named entity annotations:

pywebanno layer disable de.tudarmstadt.ukp.dkpro.core.api.ner.type.NamedEntity projslist.json