Skip to content

milekpl/LexCW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

326 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LexCW Logo Lexicographic Curation Workbench

A professional tool for creating and managing comprehensive dictionaries using the LIFT (Lexicon Interchange FormaT) standard. This Flask-based application provides full support for LIFT 0.13+ with extensive features designed for lexicographers, linguists, and language documentation specialists.

🌟 Key Features

Core Lexicographic Features

  • Multilingual Support: Every text field supports multiple writing systems simultaneously
  • Senses & Subsenses: Organize word meanings hierarchically to capture polysemy and semantic relationships
  • Examples & Usage: Rich contextual information with source language examples and translations
  • Pronunciation Management: Comprehensive phonetic documentation with IPA transcription, audio files, and TTS integration
  • Etymology Tracking: Document word origins and historical development
  • Variants & Allomorphs: Document different forms of the same lexeme following SIL Fieldworks approach
  • Lexical Relations: Create semantic networks connecting related entries (synonyms, antonyms, hypernyms, etc.)
  • Reversals: Essential for bilingual dictionaries β€” create L2β†’L1 lookup capability
  • Annotations & Messages: Editorial workflow and quality control with per-entry discussion threads
  • Edit History & Change Tracking: Every entry save automatically records a revision with a full JSON snapshot of the entry state. Field-level diffs are computed against the previous revision, showing exactly what changed (added, removed, or modified fields). Per-entry revision timelines are displayed directly on the edit page. The Change Analytics dashboard at /workbench/analytics aggregates edit activity across the dictionary with date-range filtering, by-field breakdowns, top editors, and a revision timeline chart β€” giving editors visibility into what changed, when, and by whom.

Editing & Curation

  • Entry Form: Rich multilingual editing with POS inheritance, variant relations, component relations, and subentries
  • Worksets: Query-based dynamic collections of entries with curation metadata (status, favorites, notes)
  • Bulk Operations: Batch update traits, POS tags, and other fields across multiple entries with preview mode
  • Merge & Split: Merge multiple entries into one or split an entry into multiple, with full undo/redo history
  • Auto-Save: Automatic form state persistence with undo/redo for entry edits
  • Keyboard Shortcuts: Full keyboard navigation and editing workflow

Import & Export

  • LIFT Import/Export: Full bidirectional LIFT 0.13+ support with merge/replace modes
  • SFM/Shoebox Import: Two-step import with marker auto-detection and interactive mapping UI
  • FieldWorks list.xml Import: Abbreviation import from FieldWorks
  • HTML Export: Generate browsable static HTML dictionaries with alphabetical navigation
  • Markdown Export: Export entries in Markdown format

Quality Assurance

  • Validation Engine: Multiple validation backends β€” Schematron (XSLT), Hunspell spelling, LanguageTool grammar, IPA pronunciation, real-time field validation
  • Validation Rules: Project-specific validation rules with admin UI
  • AI Proofreading: LLM-powered proofreading and drafting of entries (BYOK β€” bring your own API key)
  • Data Quality Dashboard: Overview of dictionary health and completeness

Customization & Workflow

  • Custom Fields: Extend LIFT to meet specific project needs with FieldWorks-compatible custom fields
  • Ranges Editor: Full CRUD for controlled vocabularies (grammatical categories, semantic domains, lexical relations)
  • Display Profiles: CSS-based entry rendering system with multiple profiles and custom styling
  • Project Settings: Per-project configuration for AI, SMTP, external services, and field visibility defaults
  • Project Setup Wizard: Bootstrap new projects with recommended ranges and configurations

Technical Features

  • RESTful API: Comprehensive JSON API for all dictionary operations
  • Advanced Search: XQuery-powered full-text search across all fields with filters and facets
  • Corpus Management: Lucene-based parallel corpus search (concordance) with management UI
  • Word Sketch: External ConceptSketch integration for collocation and grammar pattern analysis
  • Backup & Restore: Manual and scheduled backups of the BaseX XML database with undo/redo history
  • User Management: Role-based access control (ADMIN, MEMBER, VIEWER) with API key authentication
  • Swagger API Docs: Interactive API documentation via Flasgger at /apidocs/
  • Docker Support: Full docker-compose setup for all services including the Flask app

πŸ› οΈ Requirements

System Requirements

  • Python 3.8+
  • BaseX XML Database (version 9.0+, port 1984)
  • PostgreSQL (version 15+, port 5432)
  • Redis (for caching, port 6379)
  • Java Runtime Environment (for BaseX and Saxon)
  • Docker & Docker Compose (recommended for easy setup)

Optional External Services

  • ConceptSketch (port 8080) β€” word sketch / collocation analysis service. Clone and run separately to enable the Word Sketch feature.
  • corpus-lucene-service (port 8082) β€” parallel corpus concordance search. Clone and run separately to enable Corpus Management.
  • LanguageTool (port 8081) β€” grammar and style checking (for validation engine)
  • Saxon XSLT Processor (included at tools/saxon/, auto-installed via install_saxon.sh) β€” Schematron XSLT2 validation

Services Setup

Use Docker Compose to start all required services:

docker compose up -d

This starts: Flask app (port 5000), BaseX (ports 1984 TCP, 8984 HTTP), PostgreSQL (port 5432), Redis (port 6379), and a test PostgreSQL instance (port 5433).

To start services individually with the provided scripts:

# Start BaseX
./start-basex.sh

# Or start all services (BaseX + Redis)
./start-services.sh

Ensure PostgreSQL is running separately (e.g. systemctl start postgresql or via your OS package manager).

πŸ“¦ Installation

1. Clone the repository

git clone <repository-url>
cd flask-app

2. Set up Python virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Python dependencies

pip install -r requirements.txt

3.1 Node (for e2e tests)

If you intend to run the Playwright end-to-end tests, install Node dependencies and download the Playwright browsers. Note that the node_modules/ directory is ignored by Git (run npm ci after cloning to populate it).

# Install Node dependencies (deterministic install)
npm ci

# Install Playwright browsers (Chromium, Firefox, WebKit)
# Use --with-deps on Linux to ensure system packages are installed
npx playwright install --with-deps chromium firefox webkit

4. Configure environment variables

Copy the example environment file and update the settings:

cp .env.example .env

Edit .env file with your configuration β€” see .env.example for all available variables. Key settings:

# BaseX
BASEX_HOST=localhost
BASEX_PORT=1984
BASEX_USERNAME=admin
BASEX_PASSWORD=admin
BASEX_DATABASE=dictionary

# PostgreSQL (worksets, users, settings)
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=dictionary_analytics
POSTGRES_USER=dict_user
POSTGRES_PASSWORD=dict_pass

# Redis (caching)
REDIS_HOST=localhost
REDIS_PORT=6379

# Flask
SECRET_KEY=your-secret-key-here

5. Start required services

Option A β€” Docker Compose (recommended):

docker compose up -d

Option B β€” Start manually:

# Start BaseX
./start-basex.sh

# Ensure PostgreSQL is running and create the database
createdb dictionary_analytics

# Redis (if installed locally)
redis-server

6. Run the application

python run.py

The application will be available at http://localhost:5000

πŸš€ Getting Started

Import Your Dictionary Data

  1. Go to Import/Export β†’ Import LIFT
  2. Upload your LIFT file to begin working with your dictionary data
  3. The application supports LIFT 0.13 format with 91% element coverage

Browse and Edit Entries

  1. Click on Entries to view your lexicon
  2. Click on any entry to open the full editor
  3. Use the comprehensive editing interface to modify or add new entries

Use the Ranges Editor

Access the Tools β†’ Ranges Editor to manage controlled vocabularies:

  • View and edit grammatical information categories
  • Manage semantic domains and lexical relations
  • Create custom classification systems

Export Your Work

Export your dictionary in multiple formats through Import/Export β†’ Export:

  • LIFT Export: Full LIFT 0.13+ XML export (single file or dual file + ranges ZIP)
  • HTML Export: Generate browsable static HTML pages with CSS-driven entry rendering
  • Markdown Export: Export entries in Markdown format for documentation or publishing

πŸ“Š LIFT 0.13 Compliance

The application provides comprehensive LIFT 0.13+ support across all major element categories:

Element Coverage:

  • Entry Elements: All essential entry components (lexeme, citation, variants, alternate forms, notes, fields)
  • Sense Elements: Complete sense management with glosses, definitions, semantic domains, and subsenses
  • Example Elements: Full example support with translations and source language
  • Pronunciation: Complete phonetic documentation with IPA, audio files, and TTS integration
  • Etymology: Full etymology tracking with source, form, and gloss
  • Custom Fields: Extensive custom field support compatible with FieldWorks/FLEx

πŸ”„ Import/Export Capabilities

Import

Format Status Details
LIFT (.lift) Full Merge or replace modes, with optional .lift-ranges and list.xml
SFM/Shoebox Full Marker auto-detection with interactive mapping interface
FieldWorks list.xml Full Abbreviation import
# Import a LIFT file with optional ranges
python -m scripts.import_lift path/to/lift_file.lift [path/to/lift_ranges.lift-ranges]

Export

Format Status Details
LIFT (.lift) Full Single file or dual file + ranges ZIP
HTML Full Browsable static HTML with CSS rendering and alphabetical navigation
Markdown Full Markdown format for documentation
Kindle (MOBI/AZW3) Available Script at tools/scripts/kindle_generator.py β€” generates Kindle-compatible dictionaries via the REST API
# Export to a LIFT file
python -m scripts.export_lift path/to/output.lift

# Generate a Kindle dictionary (requires Calibre or KindleGen)
python tools/scripts/kindle_generator.py --format mobi --output my_dict.mobi

Extension Scripts

The scripts/ and tools/scripts/ directories contain utility scripts for extending and maintaining the application:

Script Purpose
tools/scripts/kindle_generator.py Generate Kindle MOBI/AZW3 dictionaries from the API
tools/scripts/api_client.py Programmatic REST API client for batch operations
tools/scripts/ai_quality_control.py AI-powered quality checks on dictionary data
scripts/import_lift.py CLI LIFT import (alternative to web UI)
scripts/export_lift.py CLI LIFT export (alternative to web UI)
scripts/validate_xml_compatibility.py Validate LIFT XML compatibility

🌐 API Endpoints

Full interactive API documentation is available at /apidocs/ (Swagger/OpenAPI via Flasgger). Key endpoint groups:

Entry Management (/api/entries/)

  • GET / β€” List entries with pagination and search
  • GET /{id} β€” Get a specific entry
  • POST / β€” Create a new entry
  • PUT /{id} β€” Update an existing entry
  • DELETE /{id} β€” Delete an entry

Search (/api/search/)

  • GET / β€” Full-text search across all fields
  • GET /ranges β€” Get range definitions and controlled vocabularies
  • GET /ranges/{id} β€” Get values for a specific range

Export (/api/export/)

  • GET /lift β€” Export dictionary to LIFT XML
  • GET /html β€” Export dictionary to HTML
  • GET /download/{file} β€” Download a generated export file

AI Assistance (/api/ai/)

  • POST /proofread β€” AI proofreading of an entry
  • POST /draft β€” AI drafting of a new entry from description
  • POST /batch-proofread β€” Batch proofread multiple entries

Validation (/api/validation/)

  • GET /entry/{id} β€” Validate a specific entry
  • GET /dictionary β€” Validate the entire dictionary
  • POST /check β€” Run validation checks

Worksets (/api/worksets)

  • GET / β€” List worksets
  • POST / β€” Create a workset
  • POST /{id}/entries β€” Manage workset entries

Other Endpoints

  • GET /api/stats β€” Dictionary statistics and entry counts
  • POST /api/backup/create β€” Create database backup
  • POST /api/merge-split/merge β€” Merge entries
  • POST /api/merge-split/split β€” Split an entry
  • GET/POST /api/corpus/search β€” Lucene-based parallel corpus search
  • GET /api/profiles β€” Display profile management
  • GET /api/query-builder/fields β€” Available search fields
  • GET/POST /api/bulk/query β€” Query and bulk-operate on entries
  • POST /api/auth/login β€” User authentication
  • GET /api/projects/{id}/validation-rules β€” Validation rules
  • GET /api/lift/elements β€” LIFT element registry

πŸ”§ Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=app tests/

# Run JavaScript tests
npm test

# Run end-to-end tests (requires Playwright browsers)
npm run test:e2e

Development Commands

# Format Python code with black
black .

# Lint Python code
flake8

# Type checking
mypy .

# Format JavaScript code
npm run format:js

# Lint JavaScript code
npm run lint:js

⚠️ Status

This application is in active development. All core features are operational: full LIFT import/export, entry editing, search, validation, AI assistance, user management, backup/restore, and multiple export formats.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

For technical questions about LIFT format:

For lexicographic resources and guidance:

For application-specific support, contact your system administrator or open an issue in the repository.

About

Lexicographic Curation Workbench

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors