✨ We invite you to take part in the DipDup Community Survey April 2022! ✨

Python GitHub stars Latest stable release Latest pre-release) PyPI monthly downloads
GitHub tests GitHub issues GitHub pull requests License: MIT

        ____   _         ____              
       / __ \ (_)____   / __ \ __  __ ____ 
      / / / // // __ \ / / / // / / // __ \
     / /_/ // // /_/ // /_/ // /_/ // /_/ /
    /_____//_// .___//_____/ \__,_// .___/ 
             /_/                  /_/      

DipDup is a Python framework for building indexers of Tezos smart-contracts. It helps developers focus on the business logic instead of writing data storing and serving boilerplate. DipDup-based indexers are selective, which means only required data is requested. This approach allows to achieve faster indexing times and decreased load on APIs DipDup uses.

This project is maintained by the Baking Bad team. Development is supported by Tezos Foundation.

Quickstart

This page will guide you through the steps to get your first selective indexer up and running in a few minutes without getting too deep into the details.

Let's create an indexer for the tzBTC FA1.2 token contract. Our goal is to save all token transfers to the database and then calculate some statistics of its holders' activity.

A Linux environment with Python 3.10+ installed is required to use DipDup.

Create a new project

From template

Cookiecutter is a cool jinja2 wrapper to initialize hello-world templates of various frameworks and toolkits interactively. Install python-cookiecutter package systemwide, then call:

cookiecutter https://github.com/dipdup-net/cookiecutter-dipdup

From scratch

We advise using the poetry package manager for new projects.

poetry init
poetry add dipdup
poetry shell

πŸ€“ SEE ALSO

Write a configuration file

DipDup configuration is stored in YAML files of a specific format. Create a new file named dipdup.yml in your current working directory with the following content:

spec_version: 1.2
package: demo_tzbtc

database:
  kind: sqlite
  path: demo_tzbtc.sqlite3
  
contracts:
  tzbtc_mainnet:
    address: KT1PWx2mnDueood7fEmfbBDKx1D9BAnnXitn
    typename: tzbtc

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: https://api.tzkt.io
    
indexes:
  tzbtc_holders_mainnet:
    kind: operation
    datasource: tzkt_mainnet
    contracts: 
      - tzbtc_mainnet
    handlers:
      - callback: on_transfer
        pattern:
          - destination: tzbtc_mainnet
            entrypoint: transfer
      - callback: on_mint
        pattern:
          - destination: tzbtc_mainnet
            entrypoint: mint

πŸ€“ SEE ALSO

Initialize project tree

Now it's time to generate typeclasses and callback stubs. Run the following command:

dipdup init

DipDup will create a Python package demo_tzbtc having the following structure:

demo_tzbtc
β”œβ”€β”€ graphql
β”œβ”€β”€ handlers
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ on_mint.py
β”‚   └── on_transfer.py
β”œβ”€β”€ hooks
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ on_reindex.py
β”‚   β”œβ”€β”€ on_restart.py
β”‚   β”œβ”€β”€ on_index_rollback.py
β”‚   └── on_synchronized.py
β”œβ”€β”€ __init__.py
β”œβ”€β”€ models.py
β”œβ”€β”€ sql
β”‚   β”œβ”€β”€ on_reindex
β”‚   β”œβ”€β”€ on_restart
β”‚   β”œβ”€β”€ on_index_rollback
β”‚   └── on_synchronized
└── types
    β”œβ”€β”€ __init__.py
    └── tzbtc
        β”œβ”€β”€ __init__.py
        β”œβ”€β”€ parameter
        β”‚   β”œβ”€β”€ __init__.py
        β”‚   β”œβ”€β”€ mint.py
        β”‚   └── transfer.py
        └── storage.py

That's a lot of files and directories! But don't worry, we will need only models.py and handlers modules in this guide.

πŸ€“ SEE ALSO

Define data models

Our schema will consist of a single model Holder having several fields:

  • address β€” account address
  • balance β€” in tzBTC
  • volume β€” total transfer/mint amount bypassed
  • tx_count β€” number of transfers/mints
  • last_seen β€” time of the last transfer/mint

Put the following content in the models.py file:

from tortoise import Model, fields


class Holder(Model):
    address = fields.CharField(max_length=36, pk=True)
    balance = fields.DecimalField(decimal_places=8, max_digits=20, default=0)
    volume = fields.DecimalField(decimal_places=8, max_digits=20, default=0)
    tx_count = fields.BigIntField(default=0)
    last_seen = fields.DatetimeField(null=True)

πŸ€“ SEE ALSO

Implement handlers

Everything's ready to implement an actual indexer logic.

Our task is to index all the balance updates, so we'll start with a helper method to handle them. Create a file named on_balance_update.py in the handlers package with the following content:

from decimal import Decimal
import demo_tzbtc.models as models


async def on_balance_update(
    address: str,
    balance_update: Decimal, 
    timestamp: str
) -> None:
    holder, _ = await models.Holder.get_or_create(address=address)
    holder.balance += balance_update
    holder.tx_count += 1
    holder.last_seen = timestamp
    assert holder.balance >= 0, address
    await holder.save()

Three methods of tzBTC contract can alter token balances β€” transfer, mint, and burn. The last one is omitted in this tutorial for simplicity. Edit corresponding handlers to call the on_balance_update method with data from matched operations:

on_transfer.py

from typing import Optional
from decimal import Decimal

from dipdup.models import Transaction
from dipdup.context import HandlerContext

import demo_tzbtc.models as models

from demo_tzbtc.types.tzbtc.parameter.transfer import TransferParameter
from demo_tzbtc.types.tzbtc.storage import TzbtcStorage
from demo_tzbtc.handlers.on_balance_update import on_balance_update


async def on_transfer(
    ctx: HandlerContext,
    transfer: Transaction[TransferParameter, TzbtcStorage],
) -> None:
    if transfer.parameter.from_ == transfer.parameter.to:
        # NOTE: Internal tzBTC transaction
        return

    amount = Decimal(transfer.parameter.value) / (10 ** 8)
    await on_balance_update(
        address=transfer.parameter.from_,
        balance_update=-amount,
        timestamp=transfer.data.timestamp,
    )
    await on_balance_update(address=transfer.parameter.to,
                            balance_update=amount,
                            timestamp=transfer.data.timestamp)

on_mint.py

from typing import Optional
from decimal import Decimal

from dipdup.models import Transaction
from dipdup.context import HandlerContext

import demo_tzbtc.models as models

from demo_tzbtc.types.tzbtc.parameter.mint import MintParameter
from demo_tzbtc.types.tzbtc.storage import TzbtcStorage
from demo_tzbtc.handlers.on_balance_update import on_balance_update


async def on_mint(
    ctx: HandlerContext,
    mint: Transaction[MintParameter, TzbtcStorage],
) -> None:
    amount = Decimal(mint.parameter.value) / (10 ** 8)
    await on_balance_update(
        address=mint.parameter.to,
        balance_update=amount,
        timestamp=mint.data.timestamp
    )

And that's all! We can run the indexer now.

πŸ€“ SEE ALSO

Run your indexer

dipdup run

DipDup will fetch all the historical data and then switch to realtime updates. Your application data has been successfully indexed!

πŸ€“ SEE ALSO

Getting started

This part of docs covers the same features Quickstart article does, but more focused on details.

Installation

This page covers the installation of DipDup in different environments.

Host requirements

A Linux environment with Python 3.10 installed is required to use DipDup.

Minimum hardware requirements are 256 MB RAM, 1 CPU core, and some disk space for the database.

Non-Linux environments

Other UNIX-like systems (macOS, FreeBSD, etc.) should work but are not supported officially.

DipDup currently doesn't work in Windows environments due to incompatibilities in libraries it depends on. Please use WSL or Docker.

We aim to improve cross-platform compatibility in future releases.

πŸ€“ SEE ALSO

Local installation

To begin with, create a new directory for your project and enter it. Now choose one way of managing virtual environments:

Initialize a new PEP 518 project and add DipDip to dependencies.

poetry init
poetry add dipdup

pip

Create a new virtual environment and install DipDup in it.

python -m venv .venv
source .venv/bin/activate
pip install dipdup

Other options

πŸ€“ SEE ALSO

Core concepts

Big picture

DipDup is heavily inspired by The Graph Protocol, but there are several differences:

  • DipDup works with operation groups (explicit operation and all internal ones) and Big_map updates (lazy hash map structures) β€” until fully-fledged events are implemented in Tezos.
  • DipDup utilizes a microservice approach and relies heavily on existing solutions, making the SDK very lightweight and allowing it to switch API engines on demand.

Consider DipDup a set of best practices for building custom backends for decentralized applications, plus a toolkit that spares you from writing boilerplate code.

DipDup is tightly coupled with TzKT API but can generally use any data provider which implements a particular feature set. TzKT provides REST endpoints and Websocket subscriptions with flexible filters enabling selective indexing and returns "humanified" contract data, which means you don't have to handle raw Michelson expressions.

DipDup offers PostgreSQL + Hasura GraphQL Engine combo out-of-the-box to expose indexed data via REST and GraphQL with minimal configuration. However, you can use any database and API engine (e.g., write your own API backend).

Default DipDup setup and data flow

How it works

From the developer's perspective, there are three main steps for creating an indexer using DipDup framework:

  1. Write a declarative configuration file containing all the inventory and indexing rules.
  2. Describe your domain-specific data models.
  3. Implement the business logic, which is how to convert blockchain data to your models.

As a result, you get a service responsible for filling the database with the indexed data.

Within this service, there can be multiple indexers running independently.

Atomicity and persistency

DipDup applies all updates atomically block by block. In case of an emergency shutdown, it can safely recover later and continue from the level it ended. DipDup state is stored in the database per index and can be used by API consumers to determine the current indexer head.

Here are a few essential things to know before running your indexer:

  • Ensure that the database you're connecting to is used by DipDup exclusively. Changes in index configuration or models require DipDup to drop the whole database and start indexing from scratch.
  • Do not rename existing indexes in the config file without cleaning up the database first. DipDup won't handle that automatically and will treat the renamed index as new.
  • Multiple indexes pointing to different contracts should not reuse the same models (unless you know what you are doing) because synchronization is done sequentially by index.

Schema migration

DipDup does not support database schema migration: if there's any model change, it will trigger reindexing. The rationale is that it's easier and faster to start over than handle migrations that can be of arbitrary complexity and do not guarantee data consistency.

DipDup stores a hash of the SQL version of the DB schema and checks for changes each time you run indexing.

Handling chain reorgs

Reorg messages signaling chain reorganizations. That means some blocks, including all operations, are rolled back in favor of another with higher fitness. Chain reorgs happen regularly (especially in testnets), so it's not something you can ignore. You must handle such messages correctly - otherwise, you will likely accumulate duplicate or invalid data. You can implement your rollback logic by editing the on_index_rollback hook.

Single level

Single level rollbacks are processed in the following way:

  • If the new block has the same subset of operations as the replaced one β€” do nothing;
  • If the new block has all the operations from the replaced one AND several new operations β€” process those new operations;
  • If the new block misses some operations from the replaced one: trigger full reindexing.

Preparing inventory

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

Before starting indexing, you need to set up several things:

Project structure

The structure of the DipDup project package is the following:

demo_tzbtc
β”œβ”€β”€ graphql
β”œβ”€β”€ handlers
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ on_mint.py
β”‚   └── on_transfer.py
β”œβ”€β”€ hooks
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ on_reindex.py
β”‚   β”œβ”€β”€ on_restart.py
β”‚   β”œβ”€β”€ on_index_rollback.py
β”‚   └── on_synchronized.py
β”œβ”€β”€ __init__.py
β”œβ”€β”€ models.py
β”œβ”€β”€ sql
β”‚   β”œβ”€β”€ on_reindex
β”‚   β”œβ”€β”€ on_restart
β”‚   β”œβ”€β”€ on_index_rollback
β”‚   └── on_synchronized
└── types
    β”œβ”€β”€ __init__.py
    └── tzbtc
        β”œβ”€β”€ __init__.py
        β”œβ”€β”€ parameter
        β”‚   β”œβ”€β”€ __init__.py
        β”‚   β”œβ”€β”€ mint.py
        β”‚   └── transfer.py
        └── storage.py
pathdescription
graphqlGraphQL queries for Hasura (*.graphql)
handlersUser-defined callbacks to process matched operations and big map diffs
hooksUser-defined callbacks to run manually or by schedule
models.pyTortoise ORM models
sqlSQL scripts to run from callbacks (*.sql)
typesCodegened Pydantic typeclasses for contract storage/parameter

DipDup will generate all the necessary directories and files inside the project's root on init command. These include contract type definitions and callback stubs to be implemented by the developer.

Type classes

DipDup receives all smart contract data (transaction parameters, resulting storage, big_map updates) in normalized form (read more about how TzKT handles Michelson expressions) but still as raw JSON. DipDup uses contract type information to generate data classes, which allow developers to work with strictly typed data.

DipDup generates Pydantic models out of JSONSchema. You might want to install additional plugins (PyCharm, mypy) for convenient work with this library.

The following models are created at init:

  • operation indexes: storage type for all contracts met in handler patterns plus parameter type for all destination+entrypoint pairs.
  • big_map indexes: key and storage types for all big map paths in handler configs.

Nested packages

Callback modules don't have to be in top-level hooks/handlers directories. Add one or multiple dots to the callback name to define nested packages:

package: indexer
hooks:
  foo.bar:
    callback: foo.bar

After running the init command, you'll get the following directory tree (shortened for readability):

indexer
β”œβ”€β”€ hooks
β”‚   β”œβ”€β”€ foo
β”‚   β”‚   β”œβ”€β”€ bar.py
β”‚   β”‚   └── __init__.py
β”‚   └── __init__.py
└── sql
    └── foo
        └── bar
            └── .keep

The same rules apply to handler callbacks. Note that the callback field must be a valid Python package name - lowercase letters, underscores, and dots.

πŸ€“ SEE ALSO

Templates and variables

Templates allow you to reuse index configuration, e.g., for different networks (mainnet/testnet) or multiple contracts sharing the same codebase.

templates:
  my_template:
    kind: operation
    datasource: <datasource>
    contracts:
      - <contract1>
    handlers:
      - callback: callback1
        pattern:
          - destination: <contract1>
            entrypoint: call

Templates have the same syntax as indexes of all kinds; the only difference is that they additionally support placeholders enabling parameterization:

field: <placeholder>

Any string value wrapped in angle brackets is treated as a placeholder, so make sure there are no collisions with the actual values. You can use a single placeholder multiple times.

Any index implementing a template must have a value for each existing placeholder; the exception raised otherwise. Those values are within the handler context at ctx.template_values.

πŸ€“ SEE ALSO

Defining models

DipDup uses the Tortoise ORM library to cover database operations. During initialization, DipDup generates a models.py file on the top level of the package that will contain all database models. The name and location of this file cannot be changed.

A typical models.py file looks like the following:

from tortoise import Tortoise, fields
from tortoise.models import Model


class Event(Model):
    id = fields.IntField(pk=True)
    name = fields.TextField()
    datetime = fields.DatetimeField(null=True)

See the links below to learn how to use this library.

Limitations

There are some limitations introduced to make Hasura GraphQL integration easier.

  • Table names must be in snake_case
  • Model fields must be in snake_case
  • Model fields must differ from table name

πŸ€“ SEE ALSO

Implementing handlers

DipDup generates a separate file with a callback stub for each handler in every index specified in the configuration file.

In the case of the transaction handler, the callback method signature is the following:

from <package>.types.<typename>.parameter.<entrypoint_1> import EntryPoint1Parameter
from <package>.types.<typename>.parameter.<entrypoint_n> import EntryPointNParameter
from <package>.types.<typename>.storage import TypeNameStorage


async def on_transaction(
    ctx: HandlerContext,
    entrypoint_1: Transaction[EntryPoint1Parameter, TypeNameStorage],
    entrypoint_n: Transaction[EntryPointNParameter, TypeNameStorage]
) -> None:
    ...

where:

  • entrypoint_1 ... entrypoint_n are items from the according to handler pattern.
  • ctx: HandlerContext provides useful helpers and contains an internal state (see ).
  • A Transaction model contains transaction typed parameter and storage, plus other fields.

For the origination case, the handler signature will look similar:

from <package>.types.<typename>.storage import TypeNameStorage


async def on_origination(
    ctx: HandlerContext,
    origination: Origination[TypeNameStorage],
)

An Origination model contains the origination script, initial storage (typed), amount, delegate, etc.

A Big_map update handler will look like the following:

from <package>.types.<typename>.big_map.<path>_key import PathKey
from <package>.types.<typename>.big_map.<path>_value import PathValue


async def on_update(
    ctx: HandlerContext,
    update: BigMapDiff[PathKey, PathValue],
)

BigMapDiff contains action (allocate, update, or remove), nullable key and value (typed).

You can safely change argument names (e.g., in case of collisions).

Naming conventions

Python language requires all module and function names in snake case and all class names in pascal case.

A typical imports section of big_map handler callback looks like this:

from <package>.types.<typename>.storage import TypeNameStorage
from <package>.types.<typename>.parameter.<entrypoint> import EntryPointParameter
from <package>.types.<typename>.big_map.<path>_key import PathKey
from <package>.types.<typename>.big_map.<path>_value import PathValue

Here typename is defined in the contract inventory, entrypoint is specified in the handler pattern, and path is in the handler config.

DipDup does not automatically handle name collisions. Use import ... as if multiple contracts have entrypoints that share the same name:

from <package>.types.<typename>.parameter.<entrypoint> import EntryPointParameter as Alias

Advanced usage

In this section, you will find information about advanced DipDup features.

Datasources

Datasources are DipDup connectors to various APIs. TzKT data is used for indexing, other sources are complimentary.

tzkttezos-nodecoinbasemetadataipfshttp
Callback context (via ctx.datasources)βœ…βŒβœ…βœ…βœ…βœ…
DipDup indexβœ…*❌❌❌❌❌
mempool serviceβœ…*βœ…*❌❌❌❌
metadata serviceβœ…*❌❌❌❌❌

* - required

TzKT

TzKT provides REST endpoints to query historical data and SignalR (Websocket) subscriptions to get realtime updates. Flexible filters allow you to request only data needed for your application and drastically speed up the indexing process.

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: https://api.tzkt.io

TzKT datasource is based on generic HTTP datasource and thus inherits its settings (optional):

datasources:
  tzkt_mainnet:
    http:
      cache: false
      retry_count:  # retry infinetely
      retry_sleep:
      retry_multiplier:
      ratelimit_rate:
      ratelimit_period:
      connection_limit: 100
      connection_timeout: 60
      batch_size: 10000

Also you can wait for several block confirmations before processing the operations, e.g. to mitigate chain reorgs:

datasources:
  tzkt_mainnet:
    buffer_size: 1  # indexing with single block lag

Tezos node

Tezos RPC is a standard interface provided by the Tezos node. It's not suitable for indexing purposes but used for accessing mempool data and other things that are not available through TzKT.

datasources:
  tezos_node_mainnet:
    kind: tezos-node
    url: https://mainnet-tezos.giganode.io

Coinbase

A connector for Coinbase Pro API. Provides get_candles and get_oracle_data methods. It may be useful in enriching indexes of DeFi contracts with off-chain data.

datasources:
  coinbase:
    kind: coinbase

Please note that Coinbase can't replace TzKT being an index datasource. But you can access it via ctx.datasources mapping both within handler and job callbacks.

DipDup Metadata

dipdup-metadata is a standalone companion indexer for DipDup written in Go. Configure datasource in the following way:

datasources:
  metadata:
    kind: metadata
    url: https://metadata.dipdup.net
    network: mainnet|handzhounet

IPFS

While working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup now has a separate datasource to perform such requests.

datasources:
  ipfs:
    kind: ipfs
    url: https://ipfs.io/ipfs

You can use this datasource within any callback. Output is either JSON or binary data.

ipfs = ctx.get_ipfs_datasource('ipfs')

file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'

file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'

Sending arbitrary requests

DipDup datasources do not cover all available methods of underlying APIs. Let's say you want to fetch protocol of the chain you're currently indexing from TzKT:

tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
    method='get',
    url='v1/protocols/current',
    cache=False,
    weigth=1,  # ratelimiter leaky-bucket drops
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'

Datasource HTTP connection parameters (ratelimit, backoff, etc.) are applied on every request.

πŸ€“ SEE ALSO

Hooks

Hooks are user-defined callbacks called either from the ctx.fire_hook method or by scheduler (jobs config section, we'll return to this topic later).

Let's assume we want to calculate some statistics on-demand to avoid blocking an indexer with heavy computations. Add the following lines to DipDup config:

hooks:
  calculate_stats:
    callback: calculate_stats
    atomic: False
    args:
     major: bool
     depth: int

A couple of things here to pay attention to:

  • An atomic option defines whether hook callback will be wrapped in a single SQL transaction or not. If this option is set to true main indexing loop will be blocked until hook execution is complete. Some statements like REFRESH MATERIALIZED VIEW do not require to be wrapped in transactions, so choosing a value of the atomic option could decrease the time needed to perform initial indexing.
  • Values of args mapping are used as type hints in a signature of a generated callback. We will return to this topic later in this article.

Now it's time to call dipdup init. The following files will be created in the project's root:

β”œβ”€β”€ hooks
β”‚   └── calculate_stats.py
└── sql
    └── calculate_stats
        └── .keep

Content of the generated callback stub:

from dipdup.context import HookContext

async def calculate_stats(
    ctx: HookContext,
    major: bool,
    depth: int,
) -> None:
    await ctx.execute_sql('calculate_stats')

By default, hooks execute SQL scripts from the corresponding subdirectory of sql/. Remove or comment out the execute_sql call to prevent this. This way, both Python and SQL code may be executed in a single hook if needed.

πŸ€“ SEE ALSO

Default hooks

Every DipDup project has multiple hooks called default; they fire on system-wide events and, like regular hooks, are not linked to any index. Names of those hooks are reserved; you can't use them in config.

on_index_rollback

Fires when TzKT datasource has received a chain reorg message which can't be processed automatically.

If your indexer is stateless, you can just drop DB data saved after to_level and continue indexing. Otherwize, implement more complex logic. By default, this hook triggers full reindexing.

on_restart

This hook executes right before starting indexing. It allows configuring DipDup in runtime based on data from external sources. Datasources are already initialized at execution and available at ctx.datasources. You can, for example, configure logging here or add contracts and indexes in runtime instead of from static config.

on_reindex

This hook fires after the database are re-initialized after reindexing (wipe). Helpful in modifying schema with arbitrary SQL scripts before indexing.

on_synchronized

This hook fires when every active index reaches a realtime state. Here you can clear caches internal caches or do other cleanups.

πŸ€“ SEE ALSO

Job scheduler

Jobs are schedules for hooks. In some cases, it may come in handy to have the ability to run some code on schedule. For example, you want to calculate statistics once per hour instead of every time handler gets matched.

Arguments typechecking

DipDup will ensure that arguments passed to the hooks have correct types when possible. CallbackTypeError exception will be raised otherwise. Values of an args mapping in a hook config should be either built-in types or __qualname__ of external type like decimal.Decimal. Generic types are not supported: hints like Optional[int] = None will be correctly parsed during codegen but ignored on type checking.

See 12.8. jobs for details.

Reindexing

In some cases, DipDup can't proceed with indexing without a full wipe. Several reasons trigger reindexing; some are avoidable, some are not:

reasondescription
manualReindexing triggered manually from callback with ctx.reindex.
migrationApplied migration requires reindexing. Check release notes before switching between major DipDup versions to be prepared.
rollbackReorg message received from TzKT can not be processed.
config_modifiedOne of the index configs has been modified.
schema_modifiedDatabase schema has been modified. Try to avoid manual schema modifications in favor of SQL scripts.

It is possible to configure desirable action on reindexing triggered by the specific reason.

actiondescription
exception (default)Raise ReindexingRequiredError and quit with error code. The safest option since you can trigger reindexing accidentally, e.g., by a typo in config. Don't forget to set up the correct restart policy when using it with containers.
wipeDrop the whole database and start indexing from scratch. Be careful with this option!
ignoreIgnore the event and continue indexing as usual. It can lead to unexpected side-effects up to data corruption; make sure you know what you are doing.

To configure actions for each reason, add the following section to the DipDup config:

advanced:
  ...
  reindex:
    manual: wipe
    migration: exception
    rollback: ignore
    config_modified: exception
    schema_modified: exception

Feature flags

Feature flags allow users to modify some system-wide tunables that affect the behavior of the whole framework. These options are either experimental or unsuitable for generic configurations.

run command optionconfig pathis stable
--early-realtimeadvanced.early_realtimeβœ…
--merge-subscriptionsadvanced.merge_subscriptionsβœ…
--postpone-jobsadvanced.postpone_jobsβœ…
--metadata-interfaceadvanced.metadata_interfaceβœ…
advanced.skip-version-checkβœ…

A good practice is to use set feature flags in environment-specific config files.

Early realtime

By default, DipDup enters a sync state twice: before and after establishing a realtime connection. This flag allows starting collecting realtime messages while sync is in progress, right after indexes load.

Let's consider two scenarios:

  1. Indexing 10 contracts with 10 000 operations each. Initial indexing could take several hours. There is no need to accumulate incoming operations since resync time after establishing a realtime connection depends on the contract number, thus taking a negligible amount of time.

  2. Indexing 10 000 contracts with 10 operations each. Both initial sync and resync will take a while. But the number of operations received during this time won't affect RAM consumption much.

If you do not have strict RAM constraints, it's recommended to enable this flag. You'll get faster indexing times and decreased load on TzKT API.

Merge subscriptions

Subscribe to all operations/big map diffs during realtime indexing instead of separate channels. This flag helps to avoid the 10.000 subscription limit of TzKT and speed up processing. The downside is an increased RAM consumption during sync, especially if early_realtimm flag is enabled too.

Postpone jobs

Do not start the job scheduler until all indexes are synchronized. If your jobs perform some calculations that make sense only after indexing is fully finished, this toggle can save you some IOPS.

Metadata interface

Without this flag calling ctx.update_contract_metadata and ctx.update_token_metadata will make no effect. Corresponding internal tables are created on reindex in any way.

Skip version check

Disables warning about running unstable or out-of-date DipDup version.

Executing SQL scripts

Put your *.sql scripts to <package>/sql. You can run these scripts from any callback with ctx.execute_sql('name'). If name is a directory, each script it contains will be executed.

Both types of scripts are executed without being wrapped with SQL transactions. It's generally a good idea to avoid touching table data in scripts;

SQL scripts are ignored if SQLite is used as a database backend.

By default, an empty sql/<hook_name> directory is generated for every hook in config during init. Comment out execute_sql in hook code to avoid executing them.

Default hooks

Scripts from sql/on_restart directory are executed each time you run DipDup. Those scripts may contain CREATE OR REPLACE VIEW or similar non-destructive operations.

Scripts from sql/on_reindex directory are executed after the database schema is created based on the models.py module but before_ indexing starts. It may be useful to change the database schema in the ways that are not supported by the Tortoise ORM, e.g., to create a composite primary key;

Improving performance

This page contains tips that may help to increase indexing speed.

Optimize database schema

Postgres indexes are tables that Postgres can use to speed up data lookup. A database index acts like a pointer to data in a table, just like an index in a printed book. If you look in the index first, you will find the data much quicker than searching the whole book (or β€” in this case β€” database).

You should add indexes on columns often appearing in `WHERE`` clauses in your GraphQL queries and subscriptions.

Tortoise ORM uses BTree indexes by default. To set index on a field, add index=True to the field definition:

from tortoise import Model, fields


class Trade(Model):
    id = fields.BigIntField(pk=True)
    amount = fields.BigIntField()
    level = fields.BigIntField(index=True)
    timestamp = fields.DatetimeField(index=True)

Tune datasources

All datasources now share the same code under the hood to communicate with underlying APIs via HTTP. Configs of all datasources and also Hasura's one can have an optional section http with any number of the following parameters set:

datasources:
  tzkt:
    kind: tzkt
    ...
    http:
      cache: True
      retry_count: 10
      retry_sleep: 1
      retry_multiplier: 1.2
      ratelimit_rate: 100
      ratelimit_period: 60
      connection_limit: 25
      batch_size: 10000
hasura:
  url: http://hasura:8080
  http:
    ...
fielddescription
cacheWhether to cache responses
retry_countNumber of retries after request failed before giving up
retry_sleepSleep time between retries
retry_multiplierMultiplier for sleep time between retries
ratelimit_rateNumber of requests per period ("drops" in leaky bucket)
ratelimit_periodPeriod for rate limiting in seconds
connection_limitNumber of simultaneous connections
connection_timeoutConnection timeout in seconds
batch_sizeNumber of items fetched in a single paginated request (for some APIs)

Each datasource has its defaults. Usually, there's no reason to alter these settings unless you use self-hosted instances of TzKT or other datasource.

By default, DipDup retries failed requests infinitely, exponentially increasing the delay between attempts. Set retry_count parameter to limit the number of attempts.

batch_size parameter is TzKT-specific. By default, DipDup limit requests to 10000 items, the maximum value allowed on public instances provided by Baking Bad. Decreasing this value will reduce the time required for TzKT to process a single request and thus reduce the load. By reducing the connection_limit parameter, you can achieve the same effect (limited to synchronizing multiple indexes concurrently).

πŸ€“ SEE ALSO

See 12.4. datasources for details.

Use TimescaleDB for time-series

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

DipDup is fully compatible with TimescaleDB. Try its "continuous aggregates" feature, especially if dealing with market data like DEX quotes.

Cache commonly used models

If your indexer contains models having few fields and used primarily on relations, you can cache such models during synchronization.

Example code:

class Trader(Model):
    address = fields.CharField(36, pk=True)


class TraderCache:
    def __init__(self, size: int = 1000) -> None:
        self._size = size
        self._traders: OrderedDict[str, Trader] = OrderedDict()

    async def get(self, address: str) -> Trader:
        if address not in self._traders:
            # NOTE: Already created on origination
            self._traders[address], _ = await Trader.get_or_create(address=address)
              if len(self._traders) > self._size:
                self._traders.popitem(last=False)

        return self._traders[address]

trader_cache = TraderCache()

Use trader_cache.get in handlers. After sync is complete, you can clear this cache to free some RAM:

async def on_synchronized(
    ctx: HookContext,
) -> None:
    ...
    models.trader_cache.clear()

Callback context (ctx)

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

An instance of the HandlerContext class is passed to every handler providing a set of helper methods and read-only properties.

.reindex() -> None

Drops the entire database and starts the indexing process from scratch. on_index_rollback hook calls this helper by default.

.add_contract(name, address, typename) -> Coroutine

Add a new contract to the inventory.

.add_index(name, template, values) -> Coroutine

Add a new index to the current configuration.

.fire_hook(name, wait=True, **kwargs) -> None

Trigger hook execution. Unset wait to execute hook outside of the current database transaction.

.execute_sql(name) -> None

The execute_sql argument could be either name of SQL script in sql directory or an absolute/relative path. If the path is a directory, all .sql scripts within it will be executed in alphabetical order.

.update_contract_metadata(network, address, token_id, metadata) -> None

Inserts or updates the corresponding row in the service dipdup_contract_metadata table used for exposing the 5.11 Metadata interface

.update_token_metadata(network, address, token_id, metadata) -> None

Inserts or updates the corresponding row in the service dipdup_token_metadata table used for exposing the 5.11 Metadata interface

.logger

Use this instance for logging.

.template_values

You can access values used for initializing a template index.

class dipdup.context.DipDupContext(datasources: Dict[str, dipdup.datasources.datasource.Datasource], config: dipdup.config.DipDupConfig, callbacks: dipdup.context.CallbackManager)ΒΆ

Class to store application context

Parameters
  • datasources – Mapping of available datasources

  • config – DipDup configuration

  • callbacks – Low-level callback interface (intented for internal use)

  • logger – Context-aware logger instance

async execute_sql(name: str) NoneΒΆ

Execute SQL script with given name

Parameters

name – SQL script name within <project>/sql directory

async fire_handler(name: str, index: str, datasource: dipdup.datasources.tzkt.datasource.TzktDatasource, fmt: Optional[str] = None, *args, **kwargs: Any) NoneΒΆ

Fire handler with given name and arguments.

Parameters
  • name – Handler name

  • index – Index name

  • datasource – An instance of datasource that triggered the handler

  • fmt – Format string for ctx.logger messages

async fire_hook(name: str, fmt: Optional[str] = None, wait: bool = True, *args, **kwargs: Any) NoneΒΆ

Fire hook with given name and arguments.

Parameters
  • name – Hook name

  • fmt – Format string for ctx.logger messages

  • wait – Wait for hook to finish or fire and forget

async reindex(reason: Optional[Union[str, dipdup.enums.ReindexingReason]] = None, **context) NoneΒΆ

Drop the whole database and restart with the same CLI arguments

async restart() NoneΒΆ

Restart indexer preserving CLI arguments

class dipdup.context.HandlerContext(datasources: Dict[str, dipdup.datasources.datasource.Datasource], config: dipdup.config.DipDupConfig, callbacks: dipdup.context.CallbackManager, logger: dipdup.utils.FormattedLogger, handler_config: dipdup.config.HandlerConfig, datasource: dipdup.datasources.tzkt.datasource.TzktDatasource)ΒΆ

Common handler context.

class dipdup.context.HookContext(datasources: Dict[str, dipdup.datasources.datasource.Datasource], config: dipdup.config.DipDupConfig, callbacks: dipdup.context.CallbackManager, logger: dipdup.utils.FormattedLogger, hook_config: dipdup.config.HookConfig)ΒΆ

Hook callback context.

class dipdup.context.TemplateValuesDict(ctx, **kwargs)ΒΆ

Internal models

modeltabledescription
dipdup.models.Schemadipdup_schemaHash of database schema to detect changes that require reindexing.
dipdup.models.Indexdipdup_indexIndexing status, level of the latest processed block, template, and template values if applicable. Relates to Head when status is REALTIME (see dipdup.models.IndexStatus for possible values of status field)
dipdup.models.Headdipdup_headThe latest block received by a datasource from a WebSocket connection.
dipdup.models.Contractdipdup_contractNothing useful for us humans. It helps DipDup to keep track of dynamically spawned contracts. A Contract with the same name from the config takes priority over one from this table if {any, exists, provided?}.

With the help of these tables, you can set up monitoring of DipDup deployment to know when something goes wrong:

SELECT NOW() - timestamp FROM dipdup_head;

Spawning indexes at runtime

DipDup allows spawning new indexes from a template in runtime. There are two ways to do that:

⚠ WARNING

DipDup is currently not able to automatically generate types and handlers for template indexes unless there is at least one static instance.

DipDup exposes several context methods that extend the current configuration with new contracts and template instances. See 5.8. Handler context for details.

See 12.13. templates for details.

Scheduler configuration

DipDup utilizes apscheduler library to run hooks according to schedules in jobs config section. In the following example, apscheduler will spawn up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:

advanced:
  scheduler:
    apscheduler.job_defaults.coalesce: True
    apscheduler.job_defaults.max_instances: 3

See apscheduler docs for details.

Note that you can't use executors from apscheduler.executors.pool module - ConfigurationError exception will be raised.

Metadata Interface

When issuing a token on Tezos blockchain, there is an important yet not enough covered aspect related to how various ecosystem applications (wallets, explorers, marketplaces, and others) will display and interact with it. It's about token metadata, stored wholly or partially on-chain but intended for off-chain use only.

Token metadata standards

There are several standards regulating the metadata file format and the way it can be stored and exposed to consumers:

  • TZIP-21 | Rich Metadata β€” describes a metadata schema and standards for contracts and tokens
  • TZIP-12 | FA2.0 β€” a standard for a unified token contract interface, includes an article about how to store and encode token metadata
  • TZIP-7 | FA1.2 β€” single asset token standard; reuses the token metadata approach from FA2.0

Keeping aside the metadata schema, let's focus on which approaches for storing are currently standardized, their pros and cons, and what to do if any of the options available do not fit your case.

The most straightforward approach is to store everything in the contract storage, especially if it's just the basic fields (name, symbol, decimals):

storage
└── token_metadata [big_map]
    └── 0
        β”œβ”€β”€ token_id: 0
        └── token_info
            β”œβ”€β”€ name: ""
            β”œβ”€β”€ symbol: ""
            └── decimals: ""

But typically, you want to store more like a token thumbnail icon, and it is no longer feasible to keep such large data on-chain (because you pay gas for every byte stored).
Then you can put large files somewhere off-chain (e.g., IPFS) and store just links:

storage
└── token_metadata [big_map]
    └── 0
        β”œβ”€β”€ token_id: 0
        └── token_info
            β”œβ”€β”€ ...
            └── thumbnailUri: "ipfs://"

This approach is still costly, but sometimes (in rare cases), you need to have access to the metadata from the contract (example: Dogami).
We can go further and put the entire token info structure to IPFS:

storage
└── token_metadata [big_map]
    └── 0
        β”œβ”€β”€ token_id: 0
        └── token_info
            └── "": "ipfs://"

It is the most common case right now (example: HEN).

The main advantage of the basic approach is that all the changes applied to token metadata will result in big map diffs that are easily traceable by indexers. Even if you decide to replace the off-chain file, it will cause the IPFS link to change. In the case of HTTP links, indexers cannot detect the content change; thus, token metadata won't be updated.

Custom: off-chain view

The second approach presented in the TZIP-12 spec was intended to cover the cases when there's a need in reusing the same token info or when it's not possible to expose the %token_metadata big map in the standard form. Instead, it's offered to execute a special Michelson script against the contract storage and treat the result as the token info for a particular token (requested). The tricky part is that the script code itself is typically stored off-chain, and the whole algorithm would look like this:

  1. Try to fetch the empty string key of the %metadata big map to retrieve the TZIP-16 file location
  2. Resolve the TZIP-16 file (typically from IPFS) β€” it should contain the off-chain view body
  3. Fetch the current contract storage
  4. Build arguments for the off-chain view token_metadata using fetched storage and
  5. Execute the script using Tezos node RPC

Although this approach is more or less viable for wallets (when you need to fetch metadata for a relatively small amount of tokens), it becomes very inefficient for indexers dealing with millions of tokens:

  • After every contract origination, one has to try to fetch the views (even if there aren't any) β€” it means synchronous fetching, which can take seconds in the case of IPFS
  • Executing a Michelson script is currently only* possible via Tezos node, and it's quite a heavy call (setting up the VM and contract context takes time)
  • There's no clear way to detect new token metadata addition or change β€” that is actually the most critical one; you never know for sure when to call the view

Off-chain view approach is not supported by TzKT indexer, and we strongly recommend not to use it, especially for contracts that can issue multiple tokens.

DipDup-based solution

The alternative we offer for the very non-standard cases is using our selective indexing framework for custom token metadata retrieval and then feeding it back to the TzKT indexer, which essentially acts as a metadata aggregator. Note that while this can seem like a circular dependency, it's resolved on the interface level: all custom DipDup metadata indexers should expose specific GraphQL tables with certain fields:

query MyQuery {
  token_metadata() {
    metadata    // TZIP-21 JSON
    network     // mainnet or <protocol>net
    contract    // token contract address
    token_id    // token ID in the scope of the contract
    update_id   // integer cursor used for pagination
  }
}

DipDup handles table management for you and exposes a context-level helper.

Tezos Domains example:

await ctx.update_token_metadata(
    network=ctx.datasource.network,
    address=store_records.data.contract_address,
    token_id=token_id,
    metadata={
        'name': record_name,
        'symbol': 'TD',
        'decimals': '0',
        'isBooleanAmount': True,
        'domainData': decode_domain_data(store_records.value.data)
    },
)

TzKT can be configured to subscribe to one or multiple DipDup metadata sources, currently we use in production:

TzKT token metadata flow

GraphQL API

In this section, we assume you use Hasura GraphQL Engine integration to power your API.

Before starting to do client integration, it's good to know the specifics of Hasura GraphQL protocol implementation and the general state of the GQL ecosystem.

Queries

By default, Hasura generates three types of queries for each table in your schema:

  • Generic query enabling filters by all columns
  • Single item query (by primary key)
  • Aggregation query (can be disabled)

All the GQL features such as fragments, variables, aliases, directives are supported, as well as batching.
Read more in Hasura docs.

It's important to understand that GraphQL query is just a POST request with JSON payload, and in some instances, you don't need a complicated library to talk to your backend.

Pagination

By default, Hasura does not restrict the number of rows returned per request, which could lead to abuses and heavy load to your server. You can set up limits in the configuration file. See 12.5. hasura for details. But then you will face the need to paginate over the items if the response does not fit into the limits.

Subscriptions

From Hasura documentation:

Hasura GraphQL engine subscriptions are live queries, i.e., a subscription will return the latest result of the query and not necessarily all the individual events leading up to it.

This feature is essential to avoid complex state management (merging query results and subscription feed). In most scenarios, live queries are what you need to sync the latest changes from the backend.

⚠ WARNING

If the live query has a significant response size that does not fit into the limits, you need one of the following:

  1. Paginate with offset (which is not convenient)
  2. Use cursor-based pagination (e.g., by an increasing unique id).
  3. Narrow down request scope with filtering (e.g., by timestamp or level).

Ultimately you can get "subscriptions" on top of live quires by requesting all the items having ID greater than the maximum existing or all the items with a timestamp greater than now.

Websocket transport

Hasura is compatible with subscriptions-transport-ws library, which is currently deprecated by still used by the majority of the clients.

Mutations

The purpose of DipDup is to create indexers, which means you can consistently reproduce the state as long as data sources are accessible. It makes your backend "stateless" in a sense because it's tolerant of data loss.

However, you might need to introduce a non-recoverable state and mix indexed and user-generated content in some cases. DipDup allows marking these UGC tables "immune", protecting them from being wiped. In addition to that, you will need to set up Hasura Auth and adjust write permissions for the tables (by default, they are read-only).

Lastly, you will need to execute GQL mutations to modify the state from the client side. Read more about how to do that with Hasura.

Hasura integration

This optional section used by DipDup executor to automatically configure Hasura engine to track your tables.

hasura:
  url: http://hasura:8080
  admin_secret: ${HASURA_ADMIN_SECRET:-changeme}

Under the hood, DipDup generates Hasura metadata from your DB schema and applies it using Metadata API.

Hasura metadata is all about data representation in GraphQL API. The structure of the database itself is managed solely by Tortoise ORM.

Metadata configuration is idempotent: each time you call run or hasura configure command, DipDup queries the existing schema and does the merge if required. DipDup configures Hasura after reindexing, saves the hash of resulting metadata in the dipdup_schema table, and doesn't touch Hasura until needed.

Database limitations

The current version of Hasura GraphQL Engine treats public and other schemas differently. Table schema.customer becomes schema_customer root field (or schemaCustomer if camel_case option is enabled in DipDup config). Table public.customer becomes customer field, without schema prefix. There's no way to remove this prefix for now. You can track related issue (opens new window)at Hasura's GitHub to know when the situation will change. Since 3.0.0-rc1, DipDup enforces public schema name to avoid ambiguity and issues with the GenQL library. You can still use any schema name if Hasura integration is not enabled.

Authentication

DipDup sets READ only permissions for all tables and enables non-authorized access to the /graphql endpoint.

Limit number of rows

DipDup creates user role allowed to perform queries without authorization. Now you can limit the maximum number of rows such queries return and also disable aggregation queries automatically generated by Hasura:

hasura:
  select_limit: 100

Note that with limits enabled, you have to use either offset or cursor-based pagination on the client-side.

Disable aggregation queries

hasura:
  allow_aggregations: False

Convert field names to camel case

For those of you from JavaScript world, it may be more familiar to use camelCase for variable names instead of snake_case Hasura uses by default. DipDup now allows to convert all fields in metadata to this casing:

hasura:
  camel_case: true

Now this example query to hic et nunc demo indexer...

query MyQuery {
  hic_et_nunc_token(limit: 1) {
    id
    creator_id
  }
}

...will become this one:

query MyQuery {
  hicEtNuncToken(limit: 1) {
    id
    creatorId
  }
}

All fields auto generated by Hasura will be renamed accordingly: hic_et_nunc_token_by_pk to hicEtNuncTokenByPk, delete_hic_et_nunc_token to deleteHicEtNuncToken and so on. To return to defaults, set camel_case to False and run hasura configure --force.

Keep in mind that "camelcasing" is a separate stage performed after all tables are registered. So during configuration, you can observe fields in snake_case for several seconds even if hasura.camel_case flag is set.

πŸ€“ SEE ALSO

REST endpoints

Hasura 2.0 introduced the ability to expose arbitrary GraphQL queries as REST endpoints. By default, DipDup will generate GET and POST endpoints to fetch rows by primary key for all tables:

curl http://127.0.0.1:8080/api/rest/hicEtNuncHolder?address=tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw
{
  "hicEtNuncHolderByPk": {
    "address": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw"
  }
}

However, there's a limitation dictated by how Hasura parses HTTP requests: only models with primary keys of basic types (int, string, and so on) can be fetched with GET requests. An attempt to fetch model with BIGINT primary key will lead to the error: Expected bigint for variable id got Number. A workaround to fetching any model is to send a POST request containing a JSON payload with a single key:

curl -d '{"id": 152}' http://127.0.0.1:8080/api/rest/hicEtNuncToken
{
  "hicEtNuncTokenByPk": {
    "creatorId": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw",
    "id": 152,
    "level": 1365242,
    "supply": 1,
    "timestamp": "2021-03-01T03:39:21+00:00"
  }
}

We hope to get rid of this limitation someday and will let you know as soon as it happens.

Custom endpoints

You can put any number of .graphql files into graphql directory in your project's root, and DipDup will create REST endpoints for each of those queries. Let's say we want to fetch not only a specific token, but also the number of all tokens minted by its creator:

query token_and_mint_count($id: bigint) {
  hicEtNuncToken(where: {id: {_eq: $id}}) {
    creator {
      address
      tokens_aggregate {
        aggregate {
          count
        }
      }
    }
    id
    level
    supply
    timestamp
  }
}

Save this query as graphql/token_and_mint_count.graphql and run dipdup configure-hasura. Now, this query is available via REST endpoint at http://127.0.0.1:8080/api/rest/token_and_mint_count.

You can disable exposing of REST endpoints in the config:

hasura:
  rest: False

GenQL

GenQL is a great library and CLI tool that automatically generates a fully typed SDK with a built-in GQL client. It works flawlessly with Hasura and is recommended for DipDup on the client-side.

Project structure

GenQL CLI generates a ready-to-use package, compiled and prepared to publish to NPM. A typical setup is a mono repository containing several packages, including the auto-generated SDK and your front-end application.

project_root/
β”œβ”€β”€ package.json
└── packages/
    β”œβ”€β”€ app/
    β”‚   β”œβ”€β”€ package.json
    β”‚   └── src/
    └── sdk/
        └── package.json

SDK package config

Your minimal package.json file will look like the following:

{
  "name": "%PACKAGE_NAME%",
  "version": "0.0.1",
  "main": "dist/index.js",
  "types": "dist/index.d.ts",
  "devDependencies": {
    "@genql/cli": "^2.6.0"
  },
  "dependencies": {
    "@genql/runtime": "2.6.0",
    "graphql": "^15.5.0"
  },
  "scripts": {
    "build": "genql --endpoint %GRAPHQL_ENDPOINT% --output ./dist"
  }
}

That's it! Now you only need to install dependencies and execute the build target:

yarn
yarn build

Read more about CLI options available.

Demo

Create a package.json file with

  • %PACKAGE_NAME% => metadata-sdk
  • %GRAPHQL_ENDPOINT% => https://metadata.dipdup.net/v1/graphql

And generate the client:

yarn
yarn build

Then create new file index.ts and paste this query:

import { createClient, everything } from './dist'

const client = createClient()

client.chain.query
    .token_metadata({ where: { network: { _eq: 'mainnet' } }})
    .get({ ...everything })
    .then(res => console.log(res))

We need some additional dependencies to run our sample:

yarn add typescript ts-node

Finally:

npx ts-node index.ts

You should see a list of tokens with metadata attached in your console.

Troubleshooting

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

Common issues

MigrationRequiredError

Reason

DipDup was updated to release which spec_version differs from the value in the config file. You need to perform an automatic migration before starting indexing again.

Solution

  1. Run dipdup migrate command.
  2. Review and commit changes.

ReindexingRequiredError

Reason

There can be several possible reasons that require reindexing from scratch:

  • Your db models or your config (thus likely handler) changed, it means that all the previous data is probably not correct or will be inconsistent with the new one. Of course, you handle that manually or write a migration β€” luckily, there is a way to disable reindexing for such cases.
  • Also, DipDup internal models or some raw indexing mechanisms changed (e.g., a serious bug was fixed), and, unfortunetely, it is required to re-run the indexer. Sometimes those changes do not affect your particular case, and you can skip the reindexing part.
  • Finally, there are chain reorgs happening from time to time, and if you don't have your on_index_rollback handler implemented β€” be ready for those errors. Luckily there is a generic approach to mitigate that β€” just wait for another block before applying the previous one, i.e., introduce a lag into the indexing process.

Solution

You can set how to react in each of the cases described. Here's an example setup:

advanced:
  reindex:
    manual: exception
    migration: exception
    rollback: exception
    config_modified: ignore
    schema_modified: ignore

To index with a lag, add this TzKT datasource preference:

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: ${TZKT_URL:-https://api.tzkt.io}
    buffer_size: 1  # <--- one level reorgs are most common, 2-level reorgs are super rare

Reporting bugs

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

Deployment and operations

This section contains recipes to deploy and maintain DipDup instances.

Database engines

DipDup officially supports the following databases: SQLite, PostgreSQL, TimescaleDB. This page will help you choose a database engine that mostly suits your needs.

SQLitePostgreSQLTimescaleDB
Supported versionsanyanyany
When to useearly developmentgeneral usageworking with timeseries
Performancegoodbettergreat in some scenarios
SQL scriptsβŒβœ…βœ…
Immune tables*βŒβœ…βœ…
Hasura integrationβŒβœ…**βœ…**

* β€” see immune_tables config reference for details.

** β€” schema name must be public

While sometimes it's convenient to use one database engine for development and another one for production, be careful with specific column types that behave differently in various engines.

Building Docker images

FROM dipdup/dipdup:5.0.0

# Uncomment if you have an additional dependencies in pyproject.toml
# COPY pyproject.toml poetry.lock ./
# RUN inject_pyproject

COPY indexer indexer
COPY dipdup.yml dipdup.prod.yml ./

Docker compose

Make sure you have docker run and docker-compose installed.

Example docker-compose.yml file:

version: "3.8"

services:
  indexer:
    build: .
    depends_on:
      - db
    command: ["-c", "dipdup.yml", "-c", "dipdup.prod.yml", "run"]
    restart: "no"
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
      - ADMIN_SECRET=${ADMIN_SECRET:-changeme}
    volumes:
      - ./dipdup.yml:/home/dipdup/dipdup.yml
      - ./dipdup.prod.yml:/home/dipdup/dipdup.prod.yml
      - ./indexer:/home/dipdup/indexer
    ports:
      - 127.0.0.1:9000:9000

  db:
    image: timescale/timescaledb:latest-pg13
    ports:
      - 127.0.0.1:5432:5432
    volumes:
      - db:/var/lib/postgresql/data
    environment:
      - POSTGRES_USER=dipdup
      - POSTGRES_DB=dipdup
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    deploy:
      mode: replicated
      replicas: 1

  hasura:
    image: hasura/graphql-engine:v2.4.0
    ports:
      - 127.0.0.1:8080:8080
    depends_on:
      - db
    restart: always
    environment:
      - HASURA_GRAPHQL_DATABASE_URL=postgres://dipdup:${POSTGRES_PASSWORD:-changeme}@db:5432/dipdup
      - HASURA_GRAPHQL_ENABLE_CONSOLE=true
      - HASURA_GRAPHQL_DEV_MODE=true
      - HASURA_GRAPHQL_ENABLED_LOG_TYPES=startup, http-log, webhook-log, websocket-log, query-log
      - HASURA_GRAPHQL_ADMIN_SECRET=${ADMIN_SECRET:-changeme}
      - HASURA_GRAPHQL_UNAUTHORIZED_ROLE=user
      - HASURA_GRAPHQL_STRINGIFY_NUMERIC_TYPES=true

volumes:
  db:

Environment variables are expanded in the DipDup config file; Postgres password and Hasura secret are forwarded in this example.

Create a separate dipdup.<environment>.yml file for this stack:

database:
  kind: postgres
  host: db
  port: 5432
  user: dipdup
  password: ${POSTGRES_PASSWORD:-changeme}
  database: dipdup
  schema_name: demo

hasura:
  url: http://hasura:8080
  admin_secret: ${ADMIN_SECRET:-changeme}
  allow_aggregations: False
  camel_case: true
  select_limit: 100

Note the hostnames (resolved in the docker network) and environment variables (expanded by DipDup).

Build and run the containers:

docker-compose up -d --build

We recommend lazydocker for monitoring your application.

Deploying with Docker Swarm

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

Sentry integration

Sentry is an error tracking software that can be used either as a service or on-premise. It dramatically improves the troubleshooting experience and requires nearly zero configuration. To start catching exceptions with Sentry in your project, add the following section in dipdup.yml config:

sentry:
  dsn: https://...
  environment: dev
  debug: False

You can obtain Sentry DSN from the web interface at Settings -> Projects -> <project_name> -> Client Keys (DSN). The cool thing is that if you catch an exception and suspect there's a bug in DipDup, you can share this event with us using a public link (created at Share menu).

Prometheus integration

Available metrics

The following metrics will be exposed:

metric namedescription
dipdup_indexes_totalNumber of indexes in operation by status
dipdup_index_level_sync_duration_secondsDuration of indexing a single level
dipdup_index_level_realtime_duration_secondsDuration of last index syncronization
dipdup_index_total_sync_duration_secondsDuration of the last index syncronization
dipdup_index_total_realtime_duration_secondsDuration of the last index realtime syncronization
dipdup_index_levels_to_sync_totalNumber of levels to reach synced state
dipdup_index_levels_to_realtime_totalNumber of levels to reach realtime state
dipdup_index_handlers_matched_totalIndex total hits
dipdup_datasource_head_updated_timestampTimestamp of the last head update
dipdup_datasource_rollbacks_totalNumber of rollbacks
dipdup_http_errors_totalNumber of http errors
dipdup_callback_duration_secondsDuration of callback execution

Logging

Currently, you have two options to configure logging:

  1. Manually in on_restart hook
import logging

async def on_restart(
    ctx: HookContext,
) -> None:
    logging.getLogger('dipdup').setLevel('DEBUG')
  1. With Python logging config

⚠ WARNING

This feature will be deprecated soon. Consider configuring logging inside of on_restart hook.

dipdup -l logging.yml run

Example config:

version: 1
disable_existing_loggers: false
formatters:
  brief:
    format: "%(levelname)-8s %(name)-20s %(message)s"
handlers:
  console:
    level: INFO
    formatter: brief
    class: logging.StreamHandler
    stream: ext://sys.stdout
loggers:
  dipdup:
    level: INFO

  aiosqlite:
    level: INFO
  db_client:
    level: INFO
root:
  level: INFO
  handlers:
    - console

Monitoring

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

Backup and restore

DipDup has no built-in functionality to backup and restore database at the moment. Good news is that DipDup indexes are fully atomic. That means you can perform backup with regular psql/pgdump regardless of the DipDup state.

This page contains several recipes for backup/restore.

Scheduled backup to S3

This example is for Swarm deployments. We use this solution to backup our services in production. Adapt it to your needs if needed.

version: "3.8"
services:
  indexer:
    ...
  db:
    ...
  hasura:
    ...

  backuper:
    image: ghcr.io/dipdup-net/postgres-s3-backup:master
    environment:
      - S3_ENDPOINT=${S3_ENDPOINT:-https://fra1.digitaloceanspaces.com}
      - S3_ACCESS_KEY_ID=${S3_ACCESS_KEY_ID}
      - S3_SECRET_ACCESS_KEY=${S3_SECRET_ACCESS_KEY}
      - S3_BUCKET=dipdup
      - S3_PATH=dipdup
      - S3_FILENAME=${SERVICE}-postgres
      - PG_BACKUP_FILE=${PG_BACKUP_FILE}
      - PG_BACKUP_ACTION=${PG_BACKUP_ACTION:-dump}
      - PG_RESTORE_JOBS=${PG_RESTORE_JOBS:-8}
      - POSTGRES_USER=${POSTGRES_USER:-dipdup}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
      - POSTGRES_DB=${POSTGRES_DB:-dipdup}
      - POSTGRES_HOST=${POSTGRES_HOST:-db}
      - HEARTBEAT_URI=${HEARTBEAT_URI}
      - SCHEDULE=${SCHEDULE}
    deploy:
      mode: replicated
      replicas: ${BACKUP_ENABLED:-0}
      restart_policy:
        condition: on-failure
        delay: 10s
        max_attempts: 5
        window: 120s
      placement: *placement
    networks:
      - internal
    logging: *logging

Automatic restore on rollback

This awesome code was contributed by @852Kerfunkle, author of tz1and project.

<project>/backups.py

...

def backup(level: int, database_config: PostgresDatabaseConfig):
    ...

    with open('backup.sql', 'wb') as f:
        try:
            err_buf = StringIO()
            pg_dump('-d', f'postgresql://{database_config.user}:{database_config.password}@{database_config.host}:{database_config.port}/{database_config.database}', '--clean',
                '-n', database_config.schema_name, _out=f, _err=err_buf) #, '-E', 'UTF8'
        except ErrorReturnCode:
            err = err_buf.getvalue()
            _logger.error(f'Database backup failed: {err}')


def restore(level: int, database_config: PostgresDatabaseConfig):
    ...

    with open('backup.sql', 'r') as f:
        try:
            err_buf = StringIO()
            psql('-d', f'postgresql://{database_config.user}:{database_config.password}@{database_config.host}:{database_config.port}/{database_config.database}',
                '-n', database_config.schema_name, _in=f, _err=err_buf)
        except ErrorReturnCode:
            err = err_buf.getvalue()
            _logger.error(f'Database restore failed: {err}')
            raise Exception("Failed to restore")

def get_available_backups():
    ...


def delete_old_backups():
    ...

<project>/hooks/on_index_rollback.py

...

async def on_index_rollback(
    ctx: HookContext,
    index: Index,
    from_level: int,
    to_level: int,
) -> None:
    await ctx.execute_sql('on_index_rollback')

    database_config: Union[SqliteDatabaseConfig, PostgresDatabaseConfig] = ctx.config.database

    # if not a postgres db, reindex.
    if database_config.kind != "postgres":
        await ctx.reindex(ReindexingReason.ROLLBACK)

    available_levels = backups.get_available_backups()

    # if no backups available, reindex
    if not available_levels:
        await ctx.reindex(ReindexingReason.ROLLBACK)

    # find the right level. ie the on that's closest to to_level
    chosen_level = 0
    for level in available_levels:
        if level <= to_level and level > chosen_level:
            chosen_level = level

    # try to restore or reindex
    try:
        backups.restore(chosen_level, database_config)
        await ctx.restart()
    except Exception:
        await ctx.reindex(ReindexingReason.ROLLBACK)

<project>/hooks/run_backups.py

...

async def run_backups(
    ctx: HookContext,
) -> None:
    database_config: Union[SqliteDatabaseConfig, PostgresDatabaseConfig] = ctx.config.database

    if database_config.kind != "postgres":
        return

    level = ctx.get_tzkt_datasource("tzkt_mainnet")._level.get(MessageType.head)

    if level is None:
        return

    backups.backup(level, database_config)
    backups.delete_old_backups()

<project>/hooks/simulate_reorg.py

...

async def simulate_reorg(
    ctx: HookContext
) -> None:
    level = ctx.get_tzkt_datasource("tzkt_mainnet")._level.get(MessageType.head)

    if level:
        await ctx.fire_hook(
            "on_index_rollback",
            wait=True
            index=None,  # type: ignore
            from_level=level,
            to_level=level - 2,
        )

Cookbook

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

Processing offchain data

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

Reusing typename for different contracts

In some cases, you may want to make some manual changes in typeclasses and ensure they won't be lost on init. Let's say you want to reuse typename for multiple contracts providing the same interface (like FA1.2 and FA2 tokens) but having different storage structure. You can comment out differing fields which are not important for your index.

types/contract_typename/storage.py

# dipdup: ignore

...

class ContractStorage(BaseModel):
    class Config:
        extra = Extra.ignore

    some_common_big_map: Dict[str, str]
    # unique_big_map_a: Dict[str, str]
    # unique_big_map_b: Dict[str, str]

Don't forget Extra.ignore Pydantic hint, otherwise indexing will fail. Files starting with # dipdup: ignore won't be overwritten on init.

Synchronizing multiple handlers/hooks

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

Multiprocessing

It's impossible to use apscheduler pool executors with hooks because HookContext is not pickle-serializable. So, they are forbidden now in advanced.scheduler config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in <project>/cli.py:

from contextlib import AsyncExitStack

import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper


@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
    config: DipDupConfig = ctx.obj.config
    url = config.database.connection_string
    models = f'{config.package}.models'

    async with AsyncExitStack() as stack:
        await stack.enter_async_context(tortoise_wrapper(url, models))
        ...

if __name__ == '__main__':
    cli(prog_name='dipdup', standalone_mode=False)  # type: ignore

Then use python -m <project>.cli instead of dipdup as an entrypoint. Now you can call do-something-heavy like any other dipdup command. dipdup.cli:cli group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply and ctx.pool_map methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.

Examples

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

Demo projects

Here are several minimal examples how to use various DipDup features for a real case scenario:

Built with DipDup

This page is a brief overview of projects which use DipDup as an indexing solution.

Want to see your project at this page? Create an issue on GitHub!

HicDEX

Homepage | API | GitHub

HicDEX is a Tezos indexer for hicetnunc.art marketplace. Indexed data is available with a public GraphQL endpoint.

Homebase

Homepage | GitHub

Homebase is a web application that enables users to create and manage/use DAOs on the Tezos blockchain. This application aims to help empower community members and developers to launch and participate in Tezos-based DAOs.

Tezos Profiles

Homepage | API | GitHub

Tezos Profiles enables you to associate your online identity with your Tezos account.

Juster

Homepage | API | GitHub

Juster is an on-chain smart contract platform allowing users to take part in an automated betting market by creating events, providing liquidity to them, and making bets.

tz1and

A Virtual World and NFT Marketplace.

Homepage | API | GitHub

Services (plugins)

Services are standalone companion indexers written in Go.

mempool

This is an optional section used by the mempool indexer plugin. It uses contracts and datasources aliases as well as the database connection.

Mempool configuration has two sections: settings and indexers (required).

{% page-ref page="../advanced/mempool-plugin.md" %}

Settings

This section is optional so are all the setting keys.

mempool:
  settings:
    keep_operations_seconds: 172800
    expired_after_blocks: 60
    keep_in_chain_blocks: 10
    mempool_request_interval_seconds: 10
    rpc_timeout_seconds: 10
  indexers:
    ...

keep_operations_seconds

How long to store operations that did not get into the chain. After that period, such operations will be wiped from the database. Default value is 172800 seconds (2 days).

expired_after_blocks

When level(head) - level(operation.branch) >= expired_after_blocks and operation is still on in chain it's marked as expired. Default value is 60 blocks (~1 hour).

keep_in_chain_blocks

Since the main purpose of this plugin is to index mempool operations (actually it's a rolling index), all the operations that were included in the chain are removed from the database after specified period of time. Default value is 10 blocks (~10 minutes).

mempool_request_interval_seconds

How often Tezos nodes should be polled for pending mempool operations. Default value is 10 seconds.

rpc_timeout_seconds

Tezos node request timeout. Default value is 10 seconds.

Indexers

You can index several networks at once, or index different nodes independently. Indexer names are not standardized, but for clarity it's better to stick with some meaningful keys:

 mempool:
   settings:
     ...
   indexers:
     mainnet:
       filters:
         kinds:
           - transaction
         accounts:
           - contract_alias
       datasources:
         tzkt: tzkt_mainnet
         rpc: 
           - node_mainnet
     edonet:
     florencenet: 

Each indexer object has two keys: filters and datasources (required).

Filters

An optional section specifying which mempool operations should be indexed. By default all transactions will be indexed.

kinds

Array of operations kinds, default value is transaction (single item).
The complete list of values allowed:

  • activate_account
  • ballot
  • delegation*
  • double_baking_evidence
  • double_endorsement_evidence
  • endorsement
  • origination*
  • proposal
  • reveal*
  • seed_nonce_revelation
  • transaction*

* β€” manager operations.

accounts

Array of contract aliases used to filter operations by source or destination.
NOTE: applied to manager operations only.

Datasources

Mempool plugin is tightly coupled with TzKT and Tezos node providers.

tzkt

An alias pointing to a datasource of kind tzkt is expected.

rpc

An array of aliases pointing to datasources of kind tezos-node
Polling multiple nodes allows to detect more refused operations and makes indexing more robust in general.

metadata

This is an optional section used by the metadata indexer plugin. It uses contracts and datasources aliases as well as the database connection.

Metadata configuration has two required sections: settings and indexers

{% content-ref url="../advanced/metadata-plugin.md" %} metadata-plugin.md {% endcontent-ref %}

Settings

metadata:
  settings:
    ipfs_gateways:
      - https://cloudflare-ipfs.com
    ipfs_timeout: 10
    http_timeout: 10
    max_retry_count_on_error: 3
    contract_service_workers: 15
    token_service_workers: 75
  indexers:
    ...

ipfs_gateways

An array of IPFS gateways. The indexer polls them sequentially until it gets a result or runs out of attempts. It is recommended to specify more than one gateway to overcome propagation issues, rate limits, and other problems.

ipfs_timeout

How long DipDup will wait for a single IPFS gateway response. Default value is 10 seconds.

http_timeout

How long DipDup will wait for a HTTP server response. Default value is 10 seconds.

max_retry_count_on_error

If DipDup fails to get a response from IPFS gateway or HTTP server, it will try again after some time, until it runs out of attempts. Default value is 3 attempts.

contract_service_workers

Count of contract service workers which resolves contract metadata. Default value is 5.

token_service_workers

Count of token service workers which resolves token metadata. Default value is 5.

Indexers

You can index several networks at once, or go with a single one. Indexer names are not standardized, but for clarity it's better to stick with some meaningful keys:

metadata:
  settings:
    ...
  indexers:
    mainnet:
      filters:
        accounts:
          - contract_alias
      datasources:
        tzkt: tzkt_mainnet

Each indexer object has two keys: filters and datasources (required).

Filters

accounts

Array of contract aliases used to filter Big_map updates by the owner contract address.

Datasources

Metadata plugin is tightly coupled with TzKT provider.

tzkt

An alias pointing to a datasource of kind tzkt is expected.

dipdupΒΆ

Manage and run DipDup indexers.

Full docs: https://dipdup.net/docs

Report an issue: https://github.com/dipdup-net/dipdup-py/issues

dipdup [OPTIONS] COMMAND [ARGS]...

Options

--versionΒΆ

Show the version and exit.

-c, --config <config>ΒΆ

A path to DipDup project config (default: dipdup.yml).

-e, --env-file <env_file>ΒΆ

A path to .env file containing KEY=value strings.

-l, --logging-config <logging_config>ΒΆ

A path to Python logging config in YAML format.

cacheΒΆ

Manage internal cache.

dipdup cache [OPTIONS] COMMAND [ARGS]...

clearΒΆ

Clear request cache of DipDup datasources.

dipdup cache clear [OPTIONS]

showΒΆ

Show information about DipDup disk caches.

dipdup cache show [OPTIONS]

configΒΆ

Commands to manage DipDup configuration.

dipdup config [OPTIONS] COMMAND [ARGS]...

envΒΆ

Dump environment variables used in DipDup config.

If variable is not set, default value will be used.

dipdup config env [OPTIONS]

Options

-f, --file <file>ΒΆ

Output to file instead of stdout.

exportΒΆ

Print config after resolving all links and templates.

WARNING: Avoid sharing output with 3rd-parties when –unsafe flag set - it may contain secrets!

dipdup config export [OPTIONS]

Options

--unsafeΒΆ

Resolve environment variables or use default values from config.

hasuraΒΆ

Hasura integration related commands.

dipdup hasura [OPTIONS] COMMAND [ARGS]...

configureΒΆ

Configure Hasura GraphQL Engine to use with DipDup.

dipdup hasura configure [OPTIONS]

Options

--forceΒΆ

Proceed even if Hasura is already configured.

initΒΆ

Generate project tree, missing callbacks and types.

This command is idempotent, meaning it won’t overwrite previously generated files unless asked explicitly.

dipdup init [OPTIONS]

Options

--overwrite-typesΒΆ

Regenerate existing types.

--keep-schemasΒΆ

Do not remove JSONSchemas after generating types.

migrateΒΆ

Migrate project to the new spec version.

If you’re getting MigrationRequiredError after updating DipDup, this command will fix imports and type annotations to match the current spec_version. Review and commit changes after running it.

dipdup migrate [OPTIONS]

runΒΆ

Run indexer.

Execution can be gracefully interrupted with Ctrl+C or SIGTERM signal.

dipdup run [OPTIONS]

Options

--postpone-jobsΒΆ

Do not start job scheduler until all indexes are synchronized.

--early-realtimeΒΆ

Establish a realtime connection before all indexes are synchronized.

--merge-subscriptionsΒΆ

Subscribe to all operations/big map diffs during realtime indexing.

--metadata-interfaceΒΆ

Enable metadata interface.

schemaΒΆ

Manage database schema.

dipdup schema [OPTIONS] COMMAND [ARGS]...

approveΒΆ

Continue to use existing schema after reindexing was triggered.

dipdup schema approve [OPTIONS]

exportΒΆ

Print SQL schema including scripts from sql/on_reindex.

This command may help you debug inconsistency between project models and expected SQL schema.

dipdup schema export [OPTIONS]

initΒΆ

Prepare a database for running DipDip.

This command creates tables based on your models, then executes sql/on_reindex to finish preparation - the same things DipDup does when run on a clean database.

dipdup schema init [OPTIONS]

wipeΒΆ

Drop all database tables, functions and views.

WARNING: This action is irreversible! All indexed data will be lost!

dipdup schema wipe [OPTIONS]

Options

--immuneΒΆ

Drop immune tables too.

--forceΒΆ

Skip confirmation prompt.

statusΒΆ

Show the current status of indexes in the database.

dipdup status [OPTIONS]

Config file reference

DipDup configuration is stored in YAML files of a specific format. By default, DipDup searches for dipdup.yml file in the current working directory, but you can provide any path with a -c CLI option:

dipdup -c configs/config.yml run

General structure

DipDup configuration file consists of several logical blocks:

Headerspec_version*
package*
Inventorydatabase*
contracts*
datasources*
Index definitionsindexes
templates
Integrationssentry
hasura
Hookshooks
jobs

* β€” required sections

Environment variables

DipDup supports compose-style variable expansion with optional default value:

field: ${ENV_VAR:-default_value}

You can use environment variables throughout the configuration file, except for property names (YAML object keys).

Merging config files

DipDup allows you to customize the configuration for a specific environment or a workflow. It works similar to docker-compose, but only for top-level sections. If you want to override a nested property, you need to recreate a whole top-level section. To merge several DipDup config files, provide -c command-line option multiple times:

dipdup -c dipdup.yml -c dipdup.prod.yml run

Run config export command if unsure about final config used by DipDup.

class dipdup.config.AdvancedConfig(reindex: typing.Dict[dipdup.enums.ReindexingReason, dipdup.enums.ReindexingAction] = <factory>, scheduler: typing.Optional[typing.Dict[str, typing.Any]] = None, postpone_jobs: bool = False, early_realtime: bool = False, merge_subscriptions: bool = False, metadata_interface: bool = False, skip_version_check: bool = False)ΒΆ

Feature flags and other advanced config.

Parameters
  • reindex – Mapping of reindexing reasons and actions DipDup performs

  • scheduler – apscheduler scheduler config

  • postpone_jobs – Do not start job scheduler until all indexes are in realtime state

  • early_realtime – Establish realtime connection immediately after startup

  • merge_subscriptions – Subscribe to all operations instead of exact channels

  • metadata_interface – Expose metadata interface for TzKT

  • skip_version_check – Do not check for new DipDup versions on startup

class dipdup.config.BigMapHandlerConfig(callback: str, contract: Union[str, dipdup.config.ContractConfig], path: str)ΒΆ

Big map handler config

Parameters
  • contract – Contract to fetch big map from

  • path – Path to big map (alphanumeric string with dots)

initialize_big_map_type(package: str) NoneΒΆ

Resolve imports and initialize key and value type classes

class dipdup.config.BigMapIndexConfig(kind: Literal['big_map'], datasource: Union[str, dipdup.config.TzktDatasourceConfig], handlers: Tuple[dipdup.config.BigMapHandlerConfig, ...], skip_history: dipdup.enums.SkipHistory = SkipHistory.never, first_level: int = 0, last_level: int = 0)ΒΆ

Big map index config

Parameters
  • kind – always big_map

  • datasource – Index datasource to fetch big maps with

  • handlers – Description of big map diff handlers

  • skip_history – Fetch only current big map keys ignoring historical changes

  • first_level – Level to start indexing from

  • last_level – Level to stop indexing at (Dipdup will terminate at this level)

class dipdup.config.CallbackMixin(callback: str)ΒΆ

Mixin for callback configs

Parameters

callback – Callback name

class dipdup.config.CodegenMixinΒΆ

Base for pattern config classes containing methods required for codegen

locate_arguments() Dict[str, Optional[Type]]ΒΆ

Try to resolve scope annotations for arguments

class dipdup.config.CoinbaseDatasourceConfig(kind: Literal['coinbase'], api_key: Optional[str] = None, secret_key: Optional[str] = None, passphrase: Optional[str] = None, http: Optional[dipdup.config.HTTPConfig] = None)ΒΆ

Coinbase datasource config

Parameters
  • kind – always β€˜coinbase’

  • api_key – API key

  • secret_key – API secret key

  • passphrase – API passphrase

  • http – HTTP client configuration

class dipdup.config.ContractConfig(address: str, typename: Optional[str] = None)ΒΆ

Contract config

Parameters
  • address – Contract address

  • typename – User-defined alias for the contract script

class dipdup.config.DipDupConfig(spec_version: str, package: str, datasources: typing.Dict[str, typing.Union[dipdup.config.TzktDatasourceConfig, dipdup.config.CoinbaseDatasourceConfig, dipdup.config.MetadataDatasourceConfig, dipdup.config.IpfsDatasourceConfig, dipdup.config.HttpDatasourceConfig]], database: typing.Union[dipdup.config.SqliteDatabaseConfig, dipdup.config.PostgresDatabaseConfig] = SqliteDatabaseConfig(kind='sqlite', path=':memory:'), contracts: typing.Dict[str, dipdup.config.ContractConfig] = <factory>, indexes: typing.Dict[str, typing.Union[dipdup.config.OperationIndexConfig, dipdup.config.BigMapIndexConfig, dipdup.config.HeadIndexConfig, dipdup.config.TokenTransferIndexConfig, dipdup.config.IndexTemplateConfig]] = <factory>, templates: typing.Dict[str, typing.Union[dipdup.config.OperationIndexConfig, dipdup.config.BigMapIndexConfig, dipdup.config.HeadIndexConfig, dipdup.config.TokenTransferIndexConfig]] = <factory>, jobs: typing.Dict[str, dipdup.config.JobConfig] = <factory>, hooks: typing.Dict[str, dipdup.config.HookConfig] = <factory>, hasura: typing.Optional[dipdup.config.HasuraConfig] = None, sentry: typing.Optional[dipdup.config.SentryConfig] = None, prometheus: typing.Optional[dipdup.config.PrometheusConfig] = None, advanced: dipdup.config.AdvancedConfig = AdvancedConfig(reindex={}, scheduler=None, postpone_jobs=False, early_realtime=False, merge_subscriptions=False, metadata_interface=False, skip_version_check=False), custom: typing.Dict[str, typing.Any] = <factory>)ΒΆ

Main indexer config

Parameters
  • spec_version – Version of specification

  • package – Name of indexer’s Python package, existing or not

  • datasources – Mapping of datasource aliases and datasource configs

  • database – Database config

  • contracts – Mapping of contract aliases and contract configs

  • indexes – Mapping of index aliases and index configs

  • templates – Mapping of template aliases and index templates

  • jobs – Mapping of job aliases and job configs

  • hooks – Mapping of hook aliases and hook configs

  • hasura – Hasura integration config

  • sentry – Sentry integration config

  • prometheus – Prometheus integration config

  • advanced – Advanced config

  • custom – User-defined Custom config

property oneshot: boolΒΆ

Whether all indexes have last_level field set

property package_path: strΒΆ

Absolute path to the indexer package, existing or default

property per_index_rollback: boolΒΆ

Check if package has on_index_rollback hook

class dipdup.config.HTTPConfig(cache: Optional[bool] = None, retry_count: Optional[int] = None, retry_sleep: Optional[float] = None, retry_multiplier: Optional[float] = None, ratelimit_rate: Optional[int] = None, ratelimit_period: Optional[int] = None, connection_limit: Optional[int] = None, connection_timeout: Optional[int] = None, batch_size: Optional[int] = None)ΒΆ

Advanced configuration of HTTP client

Parameters
  • cache – Whether to cache responses

  • retry_count – Number of retries after request failed before giving up

  • retry_sleep – Sleep time between retries

  • retry_multiplier – Multiplier for sleep time between retries

  • ratelimit_rate – Number of requests per period (β€œdrops” in leaky bucket)

  • ratelimit_period – Time period for rate limiting in seconds

  • connection_limit – Number of simultaneous connections

  • connection_timeout – Connection timeout in seconds

  • batch_size – Number of items fetched in a single paginated request (for some APIs)

merge(other: Optional[dipdup.config.HTTPConfig]) dipdup.config.HTTPConfigΒΆ

Set missing values from other config

class dipdup.config.HandlerConfig(callback: str)ΒΆ
class dipdup.config.HasuraConfig(url: str, admin_secret: Optional[str] = None, source: str = 'default', select_limit: int = 100, allow_aggregations: bool = True, camel_case: bool = False, rest: bool = True, http: Optional[dipdup.config.HTTPConfig] = None)ΒΆ

Config for the Hasura integration.

Parameters
  • url – URL of the Hasura instance.

  • admin_secret – Admin secret of the Hasura instance.

  • source – Hasura source for DipDup to configure, others will be left untouched.

  • select_limit – Row limit for unauthenticated queries.

  • allow_aggregations – Whether to allow aggregations in unauthenticated queries.

  • camel_case – Whether to use camelCase instead of default pascal_case for the field names (incompatible with metadata_interface flag)

  • rest – Enable REST API both for autogenerated and custom queries.

  • http – HTTP connection tunables

property headers: Dict[str, str]ΒΆ

Headers to include with every request

class dipdup.config.HeadHandlerConfig(callback: str)ΒΆ

Head block handler config

class dipdup.config.HeadIndexConfig(kind: Literal['head'], datasource: Union[str, dipdup.config.TzktDatasourceConfig], handlers: Tuple[dipdup.config.HeadHandlerConfig, ...])ΒΆ

Head block index config

class dipdup.config.HookConfig(callback: str, args: typing.Dict[str, str] = <factory>, atomic: bool = False)ΒΆ

Hook config

Parameters
  • args – Mapping of argument names and annotations (checked lazily when possible)

  • atomic – Wrap hook in a single database transaction

class dipdup.config.HttpDatasourceConfig(kind: Literal['http'], url: str, http: Optional[dipdup.config.HTTPConfig] = None)ΒΆ

Generic HTTP datasource config

kind: always β€˜http’ url: URL to fetch data from http: HTTP client configuration

class dipdup.config.IndexConfig(kind: str, datasource: Union[str, dipdup.config.TzktDatasourceConfig])ΒΆ

Index config

Parameters

datasource – Alias of index datasource in datasources section

hash() strΒΆ

Calculate hash to ensure config has not changed since last run.

class dipdup.config.IndexTemplateConfig(template: str, values: Dict[str, str], first_level: int = 0, last_level: int = 0)ΒΆ

Index template config

Parameters
  • kind – always template

  • name – Name of index template

  • template_values – Values to be substituted in template (<key> -> value)

  • first_level – Level to start indexing from

  • last_level – Level to stop indexing at (DipDup will terminate at this level)

class dipdup.config.IpfsDatasourceConfig(kind: Literal['ipfs'], url: str = 'https://ipfs.io/ipfs', http: Optional[dipdup.config.HTTPConfig] = None)ΒΆ

IPFS datasource config

Parameters
  • kind – always β€˜ipfs’

  • url – IPFS node URL, e.g. https://ipfs.io/ipfs/

  • http – HTTP client configuration

class dipdup.config.JobConfig(hook: typing.Union[str, dipdup.config.HookConfig], crontab: typing.Optional[str] = None, interval: typing.Optional[int] = None, daemon: bool = False, args: typing.Dict[str, typing.Any] = <factory>)ΒΆ

Job schedule config

Parameters
  • hook – Name of hook to run

  • crontab – Schedule with crontab syntax (* * * * *)

  • interval – Schedule with interval in seconds

  • daemon – Run hook as a daemon (never stops)

  • args – Arguments to pass to the hook

class dipdup.config.LoggingConfig(config: Dict[str, Any])ΒΆ
class dipdup.config.MetadataDatasourceConfig(kind: Literal['metadata'], network: dipdup.datasources.metadata.enums.MetadataNetwork, url: str = 'https://metadata.dipdup.net', http: Optional[dipdup.config.HTTPConfig] = None)ΒΆ

DipDup Metadata datasource config

Parameters
  • kind – always β€˜metadata’

  • network – Network name, e.g. mainnet, hangzhounet, etc.

  • url – GraphQL API URL, e.g. https://metadata.dipdup.net

  • http – HTTP client configuration

class dipdup.config.NameMixinΒΆ
class dipdup.config.OperationHandlerConfig(callback: str, pattern: Tuple[Union[dipdup.config.OperationHandlerOriginationPatternConfig, dipdup.config.OperationHandlerTransactionPatternConfig], ...])ΒΆ

Operation handler config

Parameters
  • callback – Name of method in handlers package

  • pattern – Filters to match operation groups

class dipdup.config.OperationHandlerOriginationPatternConfig(type: Literal['origination'] = 'origination', source: Optional[Union[str, dipdup.config.ContractConfig]] = None, similar_to: Optional[Union[str, dipdup.config.ContractConfig]] = None, originated_contract: Optional[Union[str, dipdup.config.ContractConfig]] = None, optional: bool = False, strict: bool = False)ΒΆ

Origination handler pattern config

Parameters
  • type – always β€˜origination’

  • source – Match operations by source contract alias

  • similar_to – Match operations which have the same code/signature (depending on strict field)

  • originated_contract – Match origination of exact contract

  • optional – Whether can operation be missing in operation group

  • strict – Match operations by storage only or by the whole code

class dipdup.config.OperationHandlerTransactionPatternConfig(type: Literal['transaction'] = 'transaction', source: Optional[Union[str, dipdup.config.ContractConfig]] = None, destination: Optional[Union[str, dipdup.config.ContractConfig]] = None, entrypoint: Optional[str] = None, optional: bool = False)ΒΆ

Operation handler pattern config

Parameters
  • type – always β€˜transaction’

  • source – Match operations by source contract alias

  • destination – Match operations by destination contract alias

  • entrypoint – Match operations by contract entrypoint

  • optional – Whether can operation be missing in operation group

class dipdup.config.OperationIndexConfig(kind: typing.Literal['operation'], datasource: typing.Union[str, dipdup.config.TzktDatasourceConfig], handlers: typing.Tuple[dipdup.config.OperationHandlerConfig, ...], types: typing.Tuple[dipdup.enums.OperationType, ...] = (<OperationType.transaction: 'transaction'>,), contracts: typing.List[typing.Union[str, dipdup.config.ContractConfig]] = <factory>, first_level: int = 0, last_level: int = 0)ΒΆ

Operation index config

Parameters
  • kind – always operation

  • handlers – List of indexer handlers

  • types – Types of transaction to fetch

  • contracts – Aliases of contracts being indexed in contracts section

  • first_level – Level to start indexing from

  • last_level – Level to stop indexing at (DipDup will terminate at this level)

property address_filter: Set[str]ΒΆ

Set of addresses (any field) to filter operations with before an actual matching

property entrypoint_filter: Set[Optional[str]]ΒΆ

Set of entrypoints to filter operations with before an actual matching

class dipdup.config.ParameterTypeMixinΒΆ

parameter_type_cls field

class dipdup.config.ParentMixinΒΆ

parent field for index and template configs

class dipdup.config.PatternConfigΒΆ
class dipdup.config.PostgresDatabaseConfig(kind: typing.Literal['postgres'], host: str, user: str = 'postgres', database: str = 'postgres', port: int = 5432, schema_name: str = 'public', password: str = '', immune_tables: typing.Tuple[str, ...] = <factory>, connection_timeout: int = 60)ΒΆ

Postgres database connection config

Parameters
  • kind – always β€˜postgres’

  • host – Host

  • port – Port

  • user – User

  • password – Password

  • database – Database name

  • schema_name – Schema name

  • immune_tables – List of tables to preserve during reindexing

  • connection_timeout – Connection timeout

class dipdup.config.PrometheusConfig(host: str, port: int = 8000, update_interval: float = 1.0)ΒΆ

Config for Prometheus integration.

Parameters
  • host – Host to bind to

  • port – Port to bind to

  • update_interval – Interval to update some metrics in seconds

class dipdup.config.SentryConfig(dsn: str, environment: Optional[str] = None, debug: bool = False)ΒΆ

Config for Sentry integration.

Parameters
  • dsn – DSN of the Sentry instance

  • environment – Environment to report to Sentry (informational only)

  • debug – Catch warning messages and more context

class dipdup.config.SqliteDatabaseConfig(kind: Literal['sqlite'], path: str = ':memory:')ΒΆ

SQLite connection config

Parameters
  • kind – always β€˜sqlite’

  • path – Path to .sqlite3 file, leave default for in-memory database (:memory:)

class dipdup.config.StorageTypeMixinΒΆ

storage_type_cls field

class dipdup.config.SubscriptionsMixinΒΆ

subscriptions field

class dipdup.config.TemplateValuesMixinΒΆ

template_values field

class dipdup.config.TokenTransferHandlerConfig(callback: str)ΒΆ
class dipdup.config.TokenTransferIndexConfig(kind: typing.Literal['token_transfer'], datasource: typing.Union[str, dipdup.config.TzktDatasourceConfig], handlers: typing.Tuple[dipdup.config.TokenTransferHandlerConfig, ...] = <factory>, first_level: int = 0, last_level: int = 0)ΒΆ

Token index config

class dipdup.config.TransactionIdxMixinΒΆ

transaction_idx field to track index of operation in group

Parameters

transaction_idx –

class dipdup.config.TzktDatasourceConfig(kind: Literal['tzkt'], url: str, http: Optional[dipdup.config.HTTPConfig] = None, buffer_size: int = 0)ΒΆ

TzKT datasource config

Parameters
  • kind – always β€˜tzkt’

  • url – Base API URL, e.g. https://api.tzkt.io/

  • http – HTTP client configuration

  • buffer_size – Number of levels to keep in FIFO buffer before processing

advanced

advanced:
  early_realtime: False
  merge_subscriptions: False
  postpone_jobs: False
  reindex:
    manual: wipe
    migration: exception
    rollback: ignore
    config_modified: exception
    schema_modified: exception

This config section allows users to tune some system-wide options, either experimental or unsuitable for generic configurations.

fielddescription
reindexMapping of reindexing reasons and actions DipDup performs
schedulerapscheduler scheduler config
postpone_jobsDo not start job scheduler until all indexes are in realtime state
early_realtimeEstablish realtime connection immediately after startup
merge_subscriptionsSubscribe to all operations instead of exact channels
metadata_interfaceExpose metadata interface for TzKT

CLI flags have priority over self-titled AdvancedConfig fields.

πŸ€“ SEE ALSO

contracts

A list of the contract definitions you might use in the indexer patterns or templates. Each contract entry has two fields:

  • address β€” either originated or implicit account address encoded in base58.
  • typename β€” an alias for the particular contract script, meaning that two contracts sharing the same code can have the same type name.
contracts:
  kusd_dex_mainnet:
    address: KT1CiSKXR68qYSxnbzjwvfeMCRburaSDonT2
    typename: quipu_fa12
  tzbtc_dex_mainnet:
    address: KT1N1wwNPqT5jGhM91GQ2ae5uY8UzFaXHMJS
    typename: quipu_fa12
  kusd_token_mainnet:
    address: KT1K9gCRgaLRFKTErYt1wVxA3Frb9FjasjTV
    typename: kusd_token
  tzbtc_token_mainnet:
    address: KT1PWx2mnDueood7fEmfbBDKx1D9BAnnXitn
    typename: tzbtc_token

A typename field is only required when using index templates, but it helps to improve the readability of auto-generated code and avoid repetition.

Contract entry does not contain information about the network, so it's a good idea to include the network name in the alias. This design choice makes possible a generic index parameterization via templates. See 4.5. Templates and variables for details.

If multiple contracts you index have the same interface but different code, see 8.2. Reusing typename for different contracts.

database

DipDup supports several database engines for development and production. The obligatory field kind specifies which engine has to be used:

  • sqlite
  • postgres (and compatible engines)

Database engines article may help you choose a database that better suits your needs.

SQLite

path field must be either path to the .sqlite3 file or :memory: to keep a database in memory only (default):

database:
  kind: sqlite
  path: db.sqlite3
fielddescription
kindalways 'sqlite'
pathPath to .sqlite3 file, leave default for in-memory database

PostgreSQL

Requires host, port, user, password, and database fields. You can set schema_name to values other than public, but Hasura integration won't be available.

database:
  kind: postgres
  host: db
  port: 5432
  user: dipdup
  password: ${POSTGRES_PASSWORD:-changeme}
  database: dipdup
  schema_name: public
fielddescription
kindalways 'postgres'
hostHost
portPort
userUser
passwordPassword
databaseDatabase name
schema_nameSchema name
immune_tablesList of tables to preserve during reindexing
connection_timeoutConnection timeout in seconds

You can also use compose-style environment variable substitutions with default values for secrets and other fields. See Templates and variables for details.

Immune tables

In some cases, DipDup can't continue indexing with an existing database. See 5.3. Reindexing for details. One of the solutions to resolve reindexing state is to drop the database and start indexing from scratch. To achieve this, either invoke schema wipe command or set an action to wipe in advanced.reindex config section.

You might want to keep several tables during schema wipe if data in them is not dependent on index states yet heavy. A typical example is indexing IPFS data β€” rollbacks do not affect off-chain storage, so you can safely continue after receiving a reorg message.

database:
  immune_tables:
    - token_metadata
    - contract_metadata

immune_tables is an optional array of table names that will be ignored during schema wipe. Note that to change the schema of an immune table, you need to perform a migration by yourself. DipDup will neither drop the table nor automatically handle the update.

datasources

A list of API endpoints DipDup uses to retrieve indexing data to process.

A datasource config entry is an alias for the endpoint URI; there's no network mention. Thus it's good to add a network name to the datasource alias, e.g. tzkt_mainnet.

tzkt

datasources:
  tzkt:
    kind: tzkt
    url: ${TZKT_URL:-https://api.tzkt.io}
    http:
      cache: false
      retry_count:  # retry infinetely
      retry_sleep:
      retry_multiplier:
      ratelimit_rate:
      ratelimit_period:
      connection_limit: 100
      connection_timeout: 60
      batch_size: 10000
    buffer_size: 0

coinbase

datasources:
  coinbase:
    kind: coinbase

dipdup-metadata

datasources:
  metadata:
    kind: metadata
    url: https://metadata.dipdup.net
    network: mainnet|handzhounet

ipfs

datasources:
  ipfs:
    kind: ipfs
    url: https://ipfs.io/ipfs

πŸ€“ SEE ALSO

hasura

This optional section used by DipDup executor to automatically configure Hasura engine to track your tables.

hasura:
  url: http://hasura:8080
  admin_secret: ${HASURA_ADMIN_SECRET:-changeme}
  allow_aggregations: false
  camel_case: true
  rest: true
  select_limit: 100
  source: default

πŸ€“ SEE ALSO

hooks

Hooks are user-defined callbacks you can execute with a job scheduler or within another callback (with ctx.fire_hook).

hooks:
  calculate_stats:
    callback: calculate_stats
    atomic: False
    args:
     major: bool
     depth: int

πŸ€“ SEE ALSO

indexes

Index β€” is a basic DipDup entity connecting the inventory and specifying data handling rules.

Each index has a unique string identifier acting as a key under indexes config section:

indexes:
  my_index:
    kind: operation
    datasource: tzkt_mainnet

There can be various index kinds; currently, two possible options are supported for the kind field:

  • operation
  • big_map

All the indexes have to specify the datasource field, an alias of an existing entry under the datasources section.

Indexing scope

One can optionally specify block levels DipDup has to start and stop indexing at, e.g., there's a new version of the contract, and it will be more efficient to stop handling the old one.

indexes:
  my_index:
    first_level: 1000000
    last_level: 2000000

big_map

big_map index allows querying only updates of a specific big map (or several). In some cases, it can drastically reduce the amount of data transferred and speed up the indexing process.

indexes:
  my_index:
    kind: big_map
    datasource: tzkt
    skip_history: never
    handlers:
      - callback: on_leger_update
        contract: contract1
        path: data.ledger
      - callback: on_token_metadata_update
        contract: contract1
        path: token_metadata

Handlers

Each big_map handler contains three required fields:

  • callback β€” name of the async function with a particular signature; DipDup will try to load it from the module with the same name <package_name>.handlers.<callback>
  • contract β€” Big map parent contract (from the inventory)
  • path β€” path to the Big map in the contract storage (use dot as a delimiter)

Index only the current state of big maps

When skip_history field is set to once, DipDup will skip historical changes only on initial sync and switch to regular indexing afterward. When the value is always, DipDup will fetch all big map keys on every restart. Preferrable mode depends on your workload.

All big map diffs DipDup pass to handlers during fast sync have action field set to BigMapAction.ADD_KEY. Keep in mind that DipDup fetches all keys in this mode, including ones removed from the big map. You can filter out latter by BigMapDiff.data.active field if needed.

head

🚧 UNDER CONSTRUCTION

This page or paragraph is yet to be written. Come back later.

operation

Operation index allows you to query only those operations related to your DApp and do pattern matching on its content (internal calls chain). It is the closest thing to fully-fledged event logs.

Filters

DipDup supports filtering operations by kind, source, destination (if applicable), and originated_contract (if applicable).

DipDup fetches only applied operations.

contracts

indexes:
  my_index:
    kind: operation
    datasource: tzkt
    contracts:
      - contract1
      - contract2

In this example, DipDup will fetch all the operations where any of source and destination is equal to either contract1 or contract2 address. contracts field is obligatory, there has to be at least one contract alias (from the inventory).

types

By default, DipDup works only with transactions, but you can explicitly list operation types you want to subscribe to (currently transaction and origination types are supported):

indexes:
  my_index:
    kind: operation
    datasource: tzkt
    contracts:
      - contract1
    types:
      - transaction
      - origination

Note that in the case of originations, DipDup will query operations where either source or originated contract address is equal to contract1.

Handlers

Each operation handler contains two required fields:

  • callback β€” name of the async function with a particular signature; DipDup will try to load it from the module with the same name <package_name>.handlers.<callback>
  • pattern β€” a non-empty list of items that have to be matched
indexes:
  my_index:
    kind: operation
    datasource: tzkt
    contracts:
      - contract1
    handlers:
      - callback: on_call
        pattern:
          - destination: contract1
            entrypoint: call        

You can think of operation pattern as a regular expression on a sequence of operations (both external and internal) with global flag enabled (can be multiple matches) and where various operation parameters (type, source, destination, entrypoint, originated contract) are used for matching.

Pattern

Here are the supported filters for matching operations (all optional):

  • type β€” (either transaction or origination) usually inferred from the existence of other fields
  • destination β€” invoked contract alias (from the inventory)
  • entrypoint β€” invoked entrypoint name
  • source β€” operation sender alias (from the inventory)
  • originated_contract β€” originated contract alias (from the inventory)
  • similar_to β€” originated contract has the same parameter and storage types as the reference one (from the inventory)
  • strict β€” stronger the similar_to filter by comparing the entire code rather than just parameter+storage
  • optional β€” continue matching even if this item is not found (with limitations, see below)

It's unnecessary to match the entire operation content; you can skip external/internal calls that are not relevant. However, there is a limitation: optional items cannot be followed by operations ignored by the pattern.

pattern:
  - destination: contract_1
    entrypoint: call_1
  - destination: contract_2
    entrypoint: internal_call_2
  - source: contract_1
    type: transaction
  - source: contract_2
    type: origination
    similar_to: contract_3
    strict: true
    optional: true

You will get slightly different callback argument types depending on whether you specify destination+entrypoint for transactions and originated_contract for originations. Namely, in the first case, DipDup will generate the dataclass for a particular entrypoint/storage, and in the second case not (meaning you will have to handle untyped parameters/storage updates).

template

This index type is used for creating a static template instance.

indexes:
  my_index:
    template: my_template
    values:
      placeholder1: value1
      placeholder2: value2

For a static template instance (specified in the DipDup config) there are two fields:

  • template β€” template name (from templates section)
  • values β€” concrete values for each placeholder used in a chosen template

jobs

Add the following section to DipDup config:

jobs:
  midnight_stats:
    hook: calculate_stats
    crontab: "0 0 * * *"
    args:
      major: True
  leet_stats:
    hook: calculate_stats
    interval: 1337  # in seconds
    args:
      major: False

If you're not familiar with the crontab syntax, there's an online service crontab.guru that will help you build the desired expression.

package

DipDup uses this field to discover the Python package of your project.

package: my_indexer_name

DipDup will search for a module named my_module_name in PYTHONPATH

This field allows to decouple DipDup configuration file from the indexer implementation and gives more flexibility in managing the source code.

See 4.4. Project structure for details.

prometheus

prometheus:
  host: 0.0.0.0

Prometheus integration options

fielddescription
hostHost to bind to
portPort to bind to
update_intervalInterval to update some metrics in seconds

sentry

sentry:
  dsn: https://...
  environment: dev
  debug: False
fielddescription
dsnDSN of the Sentry instance
environmentEnvironment to report to Sentry (informational only)
debugCatch warning messages and more context

πŸ€“ SEE ALSO

spec_version

DipDup specification version is used to determine the compatibility of the toolkit and configuration file format and/or features.

spec_version: 1.2

This table shows which specific SDK releases support which DipDup file versions.

spec_version valueSupported DipDup versions
0.1>=0.0.1, <= 0.4.3
1.0>=1.0.0, <=1.1.2
1.1>=2.0.0, <=2.0.9
1.2>=3.0.0

If you're getting MigrationRequiredError after updating the framework, run dipdup migrate command to perform project migration.

templates

indexes:
  foo:
    kind: template
    name: bar
    first_level: 12341234
    template_values:
      network: mainnet

templates:
  bar:
    kind" index
    datasource: tzkt_<network>  # resolves into `tzkt_mainnet`
    ...
fielddescription
kindalways template
nameName of index template
template_valuesValues to be substituted in template (<key> β†’ value)
first_levelLevel to start indexing from
last_levelLevel to stop indexing at (DipDup will terminate at this level)

Changelog

5.1.7 - 2022-06-15

Fixed

  • index: Fixed token_transfer index not receiving realtime updates.

5.1.6 - 2022-06-08

Fixed

  • cli: Commands with --help option no longer require a working DipDup config.
  • index: Fixed crash with RuntimeError after continuous realtime connection loss.

Performance

  • cli: Lazy import dependencies to speed up startup.

Other

  • docs: Migrate docs from GitBook to mdbook.

5.1.5 - 2022-06-05

Fixed

  • config: Fixed crash when rollback hook is about to be called.

5.1.4 - 2022-06-02

Fixed

  • config: Fixed OperationIndexConfig.types field being partially ignored.
  • index: Allow mixing oneshot and regular indexes in a single config.
  • index: Call rollback hook instead of triggering reindex when single-level rollback has failed.
  • index: Fixed crash with RuntimeError after continuous realtime connection loss.
  • tzkt: Fixed origination subscription missing when merge_subscriptions flag is set.

Performance

  • ci: Decrease the size of generic and -pytezos Docker images by 11% and 16%, respectively.

5.1.3 - 2022-05-26

Fixed

  • database: Fixed special characters in password not being URL encoded.

Performance

  • context: Do not reinitialize config when adding a single index.

5.1.2 - 2022-05-24

Added

  • tzkt: Added originated_contract_tzips field to OperationData.

Fixed

  • jobs: Fixed jobs with daemon schedule never start.
  • jobs: Fixed failed jobs not throwing exceptions into the main loop.

Other

  • database: Tortoise ORM updated to 0.19.1.

5.1.1 - 2022-05-13

Fixed

  • index: Ignore indexes with different message types on rollback.
  • metadata: Add ithacanet to available networks.

5.1.0 - 2022-05-12

Added

  • ci: Push X and X.Y tags to the Docker Hub on release.
  • cli: Added config env command to export env-file with default values.
  • cli: Show warning when running an outdated version of DipDup.
  • hooks: Added a new hook on_index_rollback to perform per-index rollbacks.

Fixed

  • index: Fixed fetching migration operations.
  • tzkt: Fixed possible data corruption when using the buffer_size option.
  • tzkt: Fixed reconnection due to websockets message size limit.

Deprecated

  • hooks: The on_rollback default hook is superseded by on_index_rollback and will be removed later.

5.0.4 - 2022-05-05

Fixed

  • exceptions: Fixed incorrect formatting and broken links in help messages.
  • index: Fixed crash when the only index in config is head.
  • index: Fixed fetching originations during the initial sync.

5.0.3 - 2022-05-04

Fixed

  • index: Fixed crash when no block with the same level arrived after a single-level rollback.
  • index: Fixed setting initial index level when IndexConfig.first_level is set.
  • tzkt: Fixed delayed emitting of buffered realtime messages.
  • tzkt: Fixed inconsistent behavior of first_level/last_level arguments in different getter methods.

5.0.2 - 2022-04-21

Fixed

  • context: Fixed reporting incorrect reindexing reason.
  • exceptions: Fixed crash with FrozenInstanceError when an exception is raised from a callback.
  • jobs: Fixed graceful shutdown of daemon jobs.

Improved

  • codegen: Refined on_rollback hook template.
  • exceptions: Updated help messages for known exceptions.
  • tzkt: Do not request reindexing if missing subgroups have matched no handlers.

5.0.1 - 2022-04-12

Fixed

  • cli: Fixed schema init command crash with SQLite databases.
  • index: Fixed spawning datasources in oneshot mode.
  • tzkt: Fixed processing realtime messages.

5.0.0 - 2022-04-08

This release contains no changes except for the version number.

5.0.0-rc4 - 2022-04-04

Added

  • tzkt: Added ability to process realtime messages with lag.

4.2.7 - 2022-04-02

Fixed

  • config: Fixed jobs config section validation.
  • hasura: Fixed metadata generation for v2.3.0 and above.
  • tzkt: Fixed get_originated_contracts and get_similar_contracts methods response.

5.0.0-rc3 - 2022-03-28

Added

  • config: Added custom section to store arbitrary user data.

Fixed

  • config: Fixed default SQLite path (:memory:).
  • tzkt: Fixed pagination in several getter methods.
  • tzkt: Fixed data loss when skip_history option is enabled.

Removed

  • config: Removed dummy advanced.oneshot flag.
  • cli: Removed docker init command.
  • cli: Removed dummy schema approve --hashes flag.

5.0.0-rc2 - 2022-03-13

Fixed

  • tzkt: Fixed crash in methods that do not support cursor pagination.
  • prometheus: Fixed invalid metric labels.

5.0.0-rc1 - 2022-03-02

Added

  • metadata: Added metadata_interface feature flag to expose metadata in TzKT format.
  • prometheus: Added ability to expose Prometheus metrics.
  • tzkt: Added missing fields to the HeadBlockData model.
  • tzkt: Added iter_... methods to iterate over item batches.

Fixed

  • tzkt: Fixed possible OOM while calling methods that support pagination.
  • tzkt: Fixed possible data loss in get_originations and get_quotes methods.

Changed

  • tzkt: Added offset and limit arguments to all methods that support pagination.

Removed

  • bcd: Removed bcd datasource and config section.

Performance

  • dipdup: Use fast orjson library instead of built-in json where possible.

4.2.6 - 2022-02-25

Fixed

  • database: Fixed generating table names from uppercase model names.
  • http: Fixed bug that leads to caching invalid responses on the disk.
  • tzkt: Fixed processing realtime messages with data from multiple levels.

4.2.5 - 2022-02-21

Fixed

  • database: Do not add the schema argument to the PostgreSQL connection string when not needed.
  • hasura: Wait for Hasura to be configured before starting indexing.

4.2.4 - 2022-02-14

Added

  • config: Added http datasource to making arbitrary http requests.

Fixed

  • context: Fixed crash when calling fire_hook method.
  • context: Fixed HookConfig.atomic flag, which was ignored in fire_hook method.
  • database: Create missing tables even if Schema model is present.
  • database: Fixed excess increasing of decimal context precision.
  • index: Fixed loading handler callbacks from nested packages (@veqtor).

Other

  • ci: Added GitHub Action to build and publish Docker images for each PR opened.

4.2.3 - 2022-02-08

Fixed

  • ci: Removed black 21.12b0 dependency since bug in datamodel-codegen-generator is fixed.
  • cli: Fixed config export command crash when advanced.reindex dictionary is present.
  • cli: Removed optionals from config export output so the result can be loaded again.
  • config: Verify advanced.scheduler config for the correctness and unsupported features.
  • context: Fixed ignored wait argument of fire_hook method.
  • hasura: Fixed processing relation fields with missing related_name.
  • jobs: Fixed default apscheduler config.
  • tzkt: Fixed crash occurring when reorg message is the first one received by the datasource.

4.2.2 - 2022-02-01

Fixed

  • config: Fixed ipfs datasource config.

4.2.1 - 2022-01-31

Fixed

  • ci: Added black 21.12b0 dependency to avoid possible conflict with datamodel-codegen-generator.

4.2.0 - 2022-01-31

Added

  • context: Added wait argument to fire_hook method to escape current transaction context.
  • context: Added ctx.get_<kind>_datasource helpers to avoid type casting.
  • hooks: Added ability to configure apscheduler with AdvancedConfig.scheduler field.
  • http: Added request method to send arbitrary requests (affects all datasources).
  • ipfs: Added ipfs datasource to download JSON and binary data from IPFS.

Fixed

  • http: Removed dangerous method close_session.
  • context: Fixed help message of IndexAlreadyExistsError exception.

Deprecated

  • bcd: Added deprecation notice.

Other

  • dipdup: Removed unused internal methods.

4.1.2 - 2022-01-27

Added

  • cli: Added schema wipe --force argument to skip confirmation prompt.

Fixed

  • cli: Show warning about deprecated --hashes argument
  • cli: Ignore SIGINT signal when shutdown is in progress.
  • sentry: Ignore exceptions when shutdown is in progress.

4.1.1 - 2022-01-25

Fixed

  • cli: Fixed stacktraces missing on exception.
  • cli: Fixed wrapping OSError with ConfigurationError during config loading.
  • hasura: Fixed printing help messages on HasuraError.
  • hasura: Preserve a list of sources in Hasura Cloud environments.
  • hasura: Fixed HasuraConfig.source config option.

Changed

  • cli: Unknown exceptions are no longer wrapped with DipDupError.

Performance

  • hasura: Removed some useless requests.

4.1.0 - 2022-01-24

Added

  • cli: Added schema init command to initialize database schema.
  • cli: Added --force flag to hasura configure command.
  • codegen: Added support for subpackages inside callback directories.
  • hasura: Added dipdup_head_status view and REST endpoint.
  • index: Added an ability to skip historical data while synchronizing big_map indexes.
  • metadata: Added metadata datasource.
  • tzkt: Added get_big_map and get_contract_big_maps datasource methods.

4.0.5 - 2022-01-20

Fixed

  • index: Fixed deserializing manually modified typeclasses.

4.0.4 - 2022-01-17

Added

  • cli: Added --keep-schemas flag to init command to preserve JSONSchemas along with generated types.

Fixed

  • demos: Tezos Domains and Homebase DAO demos were updated from edo2net to mainnet contracts.
  • hasura: Fixed missing relations for models with ManyToManyField fields.
  • tzkt: Fixed parsing storage with nested structures.

Performance

  • dipdup: Minor overall performance improvements.

Other

  • ci: Cache virtual environment in GitHub Actions.
  • ci: Detect CI environment and skip tests that fail in GitHub Actions.
  • ci: Execute tests in parallel with pytest-xdist when possible.
  • ci: More strict linting rules of flake8.

4.0.3 - 2022-01-09

Fixed

  • tzkt: Fixed parsing parameter with an optional value.

4.0.2 - 2022-01-06

Added

  • tzkt: Added optional delegate_address and delegate_alias fields to OperationData.

Fixed

  • tzkt: Fixed crash due to unprocessed pysignalr exception.
  • tzkt: Fixed parsing OperationData.amount field.
  • tzkt: Fixed parsing storage with top-level boolean fields.

4.0.1 - 2021-12-30

Fixed

  • codegen: Fixed generating storage typeclasses with Union fields.
  • codegen: Fixed preprocessing contract JSONSchema.
  • index: Fixed processing reindexing reason saved in the database.
  • tzkt: Fixed processing operations with default entrypoint and empty parameter.
  • tzkt: Fixed crash while recursively applying bigmap diffs to the storage.

Performance

  • tzkt: Increased speed of applying bigmap diffs to operation storage.

4.0.0 - 2021-12-24

This release contains no changes except for the version number.

4.0.0-rc4 - 2021-12-20

Fixed

  • cli: Fixed missing schema approve --hashes argument.
  • codegen: Fixed contract address used instead of an alias when typename is not set.
  • tzkt: Fixed processing operations with entrypoint default.
  • tzkt: Fixed regression in processing migration originations.
  • tzkt: Fixed filtering of big map diffs by the path.

Removed

  • cli: Removed deprecated run --oneshot argument and clear-cache command.

4.0.0-rc2 - 2021-12-11

⚠ Migration

  • Run dipdup init command to generate on_synchronized hook stubs.

Added

  • hooks: Added on_synchronized hook, which fires each time all indexes reach realtime state.

Fixed

  • cli: Fixed config not being verified when invoking some commands.
  • codegen: Fixed generating callback arguments for untyped operations.
  • index: Fixed incorrect log messages, remove duplicate ones.
  • index: Fixed crash while processing storage of some contracts.
  • index: Fixed matching of untyped operations filtered by source field (@pravin-d).

Performance

  • index: Checks performed on each iteration of the main DipDup loop are slightly faster now.

4.0.0-rc1 - 2021-12-02

⚠ Migration

  • Run dipdup schema approve command on every database you want to use with 4.0.0-rc1. Running dipdup migrate is not necessary since spec_version hasn't changed in this release.

Added

  • cli: Added run --early-realtime flag to establish a realtime connection before all indexes are synchronized.
  • cli: Added run --merge-subscriptions flag to subscribe to all operations/big map diffs during realtime indexing.
  • cli: Added status command to print the current status of indexes from the database.
  • cli: Added config export [--unsafe] command to print config after resolving all links and variables.
  • cli: Added cache show command to get information about file caches used by DipDup.
  • config: Added first_level and last_level optional fields to TemplateIndexConfig. These limits are applied after ones from the template itself.
  • config: Added daemon boolean field to JobConfig to run a single callback indefinitely. Conflicts with crontab and interval fields.
  • config: Added advanced top-level section.

Fixed

  • cli: Fixed crashes and output inconsistency when piping DipDup commands.
  • cli: Fixed schema wipe --immune flag being ignored.
  • codegen: Fixed missing imports in handlers generated during init.
  • coinbase: Fixed possible data inconsistency caused by caching enabled for method get_candles.
  • http: Fixed increasing sleep time between failed request attempts.
  • index: Fixed invocation of head index callback.
  • index: Fixed CallbackError raised instead of ReindexingRequiredError in some cases.
  • tzkt: Fixed resubscribing when realtime connectivity is lost for a long time.
  • tzkt: Fixed sending useless subscription requests when adding indexes in runtime.
  • tzkt: Fixed get_originated_contracts and get_similar_contracts methods whose output was limited to HTTPConfig.batch_size field.
  • tzkt: Fixed lots of SignalR bugs by replacing aiosignalrcore library with pysignalr.

Changed

  • cli: dipdup schema wipe command now requires confirmation when invoked in the interactive shell.
  • cli: dipdup schema approve command now also causes a recalculation of schema and index config hashes.
  • index: DipDup will recalculate respective hashes if reindexing is triggered with config_modified: ignore or schema_modified: ignore in advanced config.

Deprecated

  • cli: run --oneshot option is deprecated and will be removed in the next major release. The oneshot mode applies automatically when last_level field is set in the index config.
  • cli: clear-cache command is deprecated and will be removed in the next major release. Use cache clear command instead.

Performance

  • config: Configuration files are loaded 10x times faster.
  • index: Number of operations processed by matcher reduced by 40%-95% depending on the number of addresses and entrypoints used.
  • tzkt: Rate limit was increased. Try to set connection_timeout to a higher value if requests fail with ConnectionTimeout exception.
  • tzkt: Improved performance of response deserialization.

3.1.3 - 2021-11-15

Fixed

  • codegen: Fixed missing imports in operation handlers.
  • codegen: Fixed invalid imports and arguments in big_map handlers.

3.1.2 - 2021-11-02

Fixed

  • Fixed crash occurred during synchronization of big map indexes.

3.1.1 - 2021-10-18

Fixed

  • Fixed loss of realtime subscriptions occurred after TzKT API outage.
  • Fixed updating schema hash in schema approve command.
  • Fixed possible crash occurred while Hasura is not ready.

3.1.0 - 2021-10-12

Added

  • New index class HeadIndex (configuration: dipdup.config.HeadIndexConfig). Use this index type to handle head (limited block header content) updates. This index type is realtime-only: historical data won't be indexed during the synchronization stage.
  • Added three new commands: schema approve, schema wipe, and schema export. Run dipdup schema --help command for details.

Changed

  • Triggering reindexing won't lead to dropping the database automatically anymore. ReindexingRequiredError is raised instead. --forbid-reindexing option has become default.
  • --reindex option is removed. Use dipdup schema wipe instead.
  • Values of dipdup_schema.reindex field updated to simplify querying database. See dipdup.enums.ReindexingReason class for possible values.

Fixed

  • Fixed ReindexRequiredError not being raised when running DipDup after reindexing was triggered.
  • Fixed index config hash calculation. Hashes of existing indexes in a database will be updated during the first run.
  • Fixed issue in BigMapIndex causing the partial loss of big map diffs.
  • Fixed printing help for CLI commands.
  • Fixed merging storage which contains specific nested structures.

Improved

  • Raise DatabaseConfigurationError exception when project models are not compatible with GraphQL.
  • Another bunch of performance optimizations. Reduced DB pressure, speeded up parallel processing lots of indexes.
  • Added initial set of performance benchmarks (run: ./scripts/run_benchmarks.sh)

3.0.4 - 2021-10-04

Improved

  • A significant increase in indexing speed.

Fixed

  • Fixed unexpected reindexing caused by the bug in processing zero- and single-level rollbacks.
  • Removed unnecessary file IO calls that could cause PermissionError exception in Docker environments.
  • Fixed possible violation of block-level atomicity during realtime indexing.

Changes

  • Public methods of TzktDatasource now return immutable sequences.

3.0.3 - 2021-10-01

Fixed

  • Fixed processing of single-level rollbacks emitted before rolled back head.

3.0.2 - 2021-09-30

Added

  • Human-readable CHANGELOG.md πŸ•Ί
  • Two new options added to dipdup run command:
    • --forbid-reindexing – raise ReindexingRequiredError instead of truncating database when reindexing is triggered for any reason. To continue indexing with existing database run UPDATE dipdup_schema SET reindex = NULL;
    • --postpone-jobs – job scheduler won't start until all indexes are synchronized.

Changed

  • Migration to this version requires reindexing.
  • dipdup_index.head_id foreign key removed. dipdup_head table still contains the latest blocks from Websocket received by each datasource.

Fixed

  • Removed unnecessary calls to TzKT API.
  • Fixed removal of PostgreSQL extensions (timescaledb, pgcrypto) by function truncate_database triggered on reindex.
  • Fixed creation of missing project package on init.
  • Fixed invalid handler callbacks generated on init.
  • Fixed detection of existing types in the project.
  • Fixed race condition caused by event emitter concurrency.
  • Capture unknown exceptions with Sentry before wrapping to DipDupError.
  • Fixed job scheduler start delay.
  • Fixed processing of reorg messages.

3.0.1 - 2021-09-24

Added

  • Added get_quote and get_quotes methods to TzKTDatasource.

Fixed

  • Defer spawning index datasources until initial sync is complete. It helps to mitigate some WebSocket-related crashes, but initial sync is a bit slower now.
  • Fixed possible race conditions in TzKTDatasource.
  • Start jobs scheduler after all indexes sync with a current head to speed up indexing.

Release notes

This section contains information about changes introduced with specific DipDup releases.

5.1.0

⚠ Migration from 5.0 (optional)

  • Run init command. Now you have two conflicting hooks: on_rollback and on_index_rollback. Follow the guide below to perform the migration. ConflictingHooksError exception will be raised until then.

What's New

Per-index rollback hook

In this release, we continue to improve the rollback-handling experience, which became much more important since the Ithaca protocol reached mainnet. Let's briefly recap how DipDup currently processes chain reorgs before calling a rollback hook:

  1. If the buffer_size option of a TzKT datasource is set to a non-zero value, and there are enough data messages buffered when a rollback occurs, data is just dropped from the buffer, and indexing continues.
  2. If all indexes in the config are operation ones, we can attempt to process a single-level rollback. All operations from rolled back block must be presented in the next one for rollback to succeed. If some operations are missing, the on_rollback hook will be called as usual.
  3. Finally, we can safely ignore indexes with a level lower than the rollback target. The index level is updated either on synchronization or when at least one related operation or bigmap diff has been extracted from a realtime message.

If none of these tricks have worked, we can't process a rollback without custom logic. Here's where changes begin. Before this release, every project contained the on_rollback hook, which receives datasource: IndexDatasource argument and from/to levels. Even if your deployment has thousands of indexes and only a couple of them are affected by rollback, you weren't able to easily find out which ones.

Now on_rollback hook is deprecated and superseded by the on_index_rollback one. Choose one of the following options:

  • You haven't touched the on_rollback hook since project creation. Run init command and remove hooks/on_rollback and sql/on_rollback directories in project root. Default action (reindexing) has not changed.
  • You have some custom logic in on_rollback hook and want to leave it as-is for now. You can ignore introduced changes at least till the next major release.
  • You have implemented per-datasource rollback logic and are ready to switch to the per-index one. Run init, move your code to the on_index_rollback hook and delete on_rollback one. Note, you can access rolled back datasource via index.datasource.

Token transfer index

Sometimes implementing an operation index is overkill for a specific task. An existing alternative is to use a big_map index to process only the diffs of selected big map paths. However, you still need to have a separate index for each contract of interest, which is very resource-consuming. A widespread case is indexing FA1.2/FA2 token contracts. So, this release introduces a new token_transfer index:

indexes:
  transfers:
    kind: token_transfer
    datasource: tzkt
    handlers:
      - callback: transfers

The TokenTransferData object is passed to the handler on each operation, containing only information enough to process a token transfer.

config env command to generate env-files

Generally, It's good to separate a project config from deployment parameters, and DipDup has multiple options to achieve this. First of all, multiple configs can be chained successively, overriding top-level sections. Second, the DipDup config can contain docker-compose-style environment variable declarations. Let's say your config contains the following content:

database:
  kind: postgres
  host: db
  port: 5432
  user: ${POSTGRES_USER:-dipdup}
  password: ${POSTGRES_PASSWORD:-changeme}
  database: ${POSTGRES_DB:-dipdup}

You can generate an env-file to use with this exact config:

$ dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=dipdup

The environment of your current shell is also taken into account:

$ POSTGRES_DB=foobar dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=foobar  # <- set from current env

Use -f <filename> option to save output on disk instead of printing to stdout. After you have modified the env-file according to your needs, you can apply it the way which is more convenient to you:

With dipdup --env-file / -e option:

dipdup -e prod.env <...> run

When using docker-compose:

services:
  indexer:
    ...
    env_file: prod.env

Keeping framework up-to-date

A bunch of new tags is now pushed to the Docker Hub on each release in addition to the X.Y.Z one: X.Y and X. That way, you can stick to a specific release without the risk of leaving a minor/major update unattended (friends don't let friends use latest πŸ˜‰). The -pytezos flavor is also available for each tag.

FROM dipdup/dipdup:5.1
...

In addition, DipDup will poll GitHub for new releases on each command which executes reasonably long and print a warning when running an outdated version. You can disable these checks with advanced.skip_version_check flag.

Pro tip: you can also enable notifications on the GitHub repo page with πŸ‘ Watch -> Custom -> tick Releases -> Apply to never miss a fresh DipDup release.

Changelog

See full 5.1.0 changelog here.

5.0.0

⚠ Breaking Changes

  • Python versions 3.8 and 3.9 are no longer supported.
  • bcd datasource has been removed.
  • Two internal tables were added, dipdup_contract_metadata and dipdup_token_metadata.
  • Some methods of tzkt datasource have changed their signatures and behavior.
  • Dummy advanced.oneshot config flag has been removed.
  • Dummy schema approve --hashes command flag has been removed.
  • docker init command has been removed.
  • ReindexingReason enumeration items have been changed.

⚠ Migration from 4.x

  • Ensure that you have a python = "^3.10" dependency in pyproject.toml.
  • Remove bcd datasources from config. Use metadata datasource instead to fetch contract and token metadata.
  • Update tzkt datasource method calls as described below.
  • Run the dipdup schema approve command on every database you use with 5.0.0.
  • Update usage of ReindexingReason enumeration if needed.

What's New

Process realtime messages with lag

Chain reorgs have occurred much recently since the Ithaca protocol reached mainnet. The preferable way to deal with rollbacks is the on_rollback hook. But if the logic of your indexer is too complex, you can buffer an arbitrary number of levels before processing to avoid reindexing.

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: https://api.tzkt.io
    buffer_size: 2

DipDup tries to remove backtracked operations from the buffer instead emitting rollback. Ithaca guarantees operations finality after one block and blocks finality after two blocks, so to completely avoid reorgs, buffer_size should be 2.

BCD API takedown

Better Call Dev API was officially deprecated in February. Thus, it's time to go for bcd datasource. In DipDup, it served the only purpose of fetching contract and token metadata. Now there's a separate metadata datasource which do the same thing but better. If you have used bcd datasource for custom requests, see How to migrate from BCD to TzKT API article.

TzKT batch request pagination

Historically, most TzktDatasource methods had a page iteration logic hidden inside. The quantity of items returned by TzKT in a single request is configured in HTTPConfig.batch_size and defaulted to 10.000. Before this release, three requests would be performed by the get_big_map method to fetch 25.000 big map keys, leading to performance degradation and extensive memory usage.

affected methodresponse size in 4.xresponse size in 5.x
get_similar_contractsunlimitedmax. datasource.request_limit
get_originated_contractsunlimitedmax. datasource.request_limit
get_big_mapunlimitedmax. datasource.request_limit
get_contract_big_mapsunlimitedmax. datasource.request_limit
get_quotesfirst datasource.request_limitmax. datasource.request_limit

All paginated methods now behave the same way. You can either iterate over pages manually or use iter_... helpers.

datasource = ctx.get_tzkt_datasource('tzkt_mainnet')
batch_iter = datasource.iter_big_map(
    big_map_id=big_map_id,
    level=last_level,
)
async for key_batch in batch_iter:
    for key in key_batch:
        ...

Metadata interface for TzKT integration

Starting with 5.0 you can store and expose custom contract and token metadata in the same format DipDup Metadata service does for TZIP-compatible metadata.

Enable this feature with advanced.metadata_interface flag, then update metadata in any callback:

await ctx.update_contract_metadata(
    network='mainnet',
    address='KT1...',
    metadata={'foo': 'bar'},
)

Metadata stored in dipdup_contract_metadata and dipdup_token_metadata tables and available via GraphQL and REST APIs.

Prometheus integration

This version introduces initial Prometheus integration. It could help you set up monitoring, find performance issues in your code, and so on. To enable this integration, add the following lines to the config:

prometheus:
  host: 0.0.0.0

πŸ€“ SEE ALSO

Changes singe 4.2.7

Added

  • config: Added custom section to store arbitrary user data.
  • metadata: Added metadata_interface feature flag to expose metadata in TzKT format.
  • prometheus: Added ability to expose Prometheus metrics.
  • tzkt: Added ability to process realtime messages with lag.
  • tzkt: Added missing fields to the HeadBlockData model.
  • tzkt: Added iter_... methods to iterate over item batches.

Fixed

  • config: Fixed default SQLite path (:memory:).
  • prometheus: Fixed invalid metric labels.
  • tzkt: Fixed pagination in several getter methods.
  • tzkt: Fixed data loss when skip_history option is enabled.
  • tzkt: Fixed crash in methods that do not support cursor pagination.
  • tzkt: Fixed possible OOM while calling methods that support pagination.
  • tzkt: Fixed possible data loss in get_originations and get_quotes methods.

Changed

  • tzkt: Added offset and limit arguments to all methods that support pagination.

Removed

  • bcd: Removed bcd datasource and config section.
  • cli: Removed docker init command.
  • cli: Removed dummy schema approve --hashes flag.
  • config: Removed dummy advanced.oneshot flag.

Performance

  • dipdup: Use fast orjson library instead of built-in json where possible.

4.2.0

What's new

ipfs datasource

While working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup now has a separate datasource to perform such requests.

datasources:
  ipfs:
    kind: ipfs
    url: https://ipfs.io/ipfs

You can use this datasource within any callback. Output is either JSON or binary data.

ipfs = ctx.get_ipfs_datasource('ipfs')

file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'

file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'

You can tune HTTP connection parameters with the http config field, just like any other datasource.

Sending arbitrary requests

DipDup datasources do not cover all available methods of underlying APIs. Let's say you want to fetch protocol of the chain you're currently indexing from TzKT:

tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
    method='get',
    url='v1/protocols/current',
    cache=False,
    weigth=1,  # ratelimiter leaky-bucket drops
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'

Datasource HTTP connection parameters (ratelimit, backoff, etc.) are applied on every request.

Firing hooks outside of the current transaction

When configuring a hook, you can instruct DipDup to wrap it in a single database transaction:

hooks:
  my_hook:
    callback: my_hook
    atomic: True

Until now, such hooks could only be fired according to jobs schedules, but not from a handler or another atomic hook using ctx.fire_hook method. This limitation is eliminated - use wait argument to escape the current transaction:

async def handler(ctx: HandlerContext, ...) -> None:
    await ctx.fire_hook('atomic_hook', wait=False)

Spin up a new project with a single command

Cookiecutter is an excellent jinja2 wrapper to initialize hello-world templates of various frameworks and toolkits interactively. Install python-cookiecutter package systemwide, then call:

cookiecutter https://github.com/dipdup-net/cookiecutter-dipdup

Advanced scheduler configuration

DipDup utilizes apscheduler library to run hooks according to schedules in jobs config section. In the following example, apscheduler spawns up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:

advanced:
  scheduler:
    apscheduler.job_defaults.coalesce: True
    apscheduler.job_defaults.max_instances: 3

See apscheduler docs for details.

Note that you can't use executors from apscheduler.executors.pool module - ConfigurationError exception raised then. If you're into multiprocessing, I'll explain why in the next paragraph.

About the present and future of multiprocessing

It's impossible to use apscheduler pool executors with hooks because HookContext is not pickle-serializable. So, they are forbidden now in advanced.scheduler config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in <project>/cli.py:

from contextlib import AsyncExitStack

import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper


@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
    config: DipDupConfig = ctx.obj.config
    url = config.database.connection_string
    models = f'{config.package}.models'

    async with AsyncExitStack() as stack:
        await stack.enter_async_context(tortoise_wrapper(url, models))
        ...

if __name__ == '__main__':
    cli(prog_name='dipdup', standalone_mode=False)  # type: ignore

Then use python -m <project>.cli instead of dipdup as an entrypoint. Now you can call do-something-heavy like any other dipdup command. dipdup.cli:cli group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply and ctx.pool_map methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.

That's all, folks! As always, your feedback is very welcome πŸ€™

4.1.0

⚠ Migration from 4.0 (optional)

  • Run dipdup schema init on the existing database to enable dipdup_head_status view and REST endpoint.

What's New

Index only the current state of big maps

big_map indexes allow achieving faster processing times than operation ones when storage updates are the only on-chain data your dapp needs to function. With this DipDup release, you can go even further and index only the current storage state, ignoring historical changes.

indexes:
  foo:
    kind: big_map
    ...
    skip_history: never|once|always

When this option is set to once, DipDup will skip historical changes only on initial sync and switch to regular indexing afterward. When the value is always, DipDup will fetch all big map keys on every restart. Preferrable mode depends on your workload.

All big map diffs DipDup pass to handlers during fast sync have action field set to BigMapAction.ADD_KEY. Keep in mind that DipDup fetches all keys in this mode, including ones removed from the big map. If needed, you can filter out the latter by BigMapDiff.data.active field.

New datasource for contract and token metadata

Since the first version DipDup allows to fetch token metadata from Better Call Dev API with bcd datasource. Now it's time for a better solution. Firstly, BCD is far from being reliable in terms of metadata indexing. Secondly, spinning up your own instance of BCD requires significant effort and computing power. Lastly, we plan to deprecate Better Call Dev API soon (but do not worry - it won't affect the explorer frontend).

Luckily, we have dipdup-metadata, a standalone companion indexer for DipDup written in Go. Configure a new datasource in the following way:

datasources:
  metadata:
    kind: metadata
    url: https://metadata.dipdup.net
    network: mainnet|handzhounet

Now you can use it anywhere in your callbacks:

datasource = ctx.datasources['metadata']
token_metadata = await datasource.get_token_metadata(address, token_id)

bcd datasource will remain available for a while, but we discourage using it for metadata processing.

Nested packages for hooks and handlers

Callback modules are no longer have to be in top-level hooks/handlers directories. Add one or multiple dots to the callback name to define nested packages:

package: indexer
hooks:
  foo.bar:
    callback: foo.bar

After running init command, you'll get the following directory tree (shortened for readability):

indexer
β”œβ”€β”€ hooks
β”‚   β”œβ”€β”€ foo
β”‚   β”‚   β”œβ”€β”€ bar.py
β”‚   β”‚   └── __init__.py
β”‚   └── __init__.py
└── sql
    └── foo
        └── bar
            └── .keep

The same rules apply to handler callbacks. Note that callback field must be a valid Python package name - lowercase letters, underscores, and dots.

New CLI commands and flags

  • schema init is a new command to prepare a database for running DipDip. It will create tables based on your models, then call on_reindex SQL hook to finish preparation - the same things DipDup does when run on a clean database.

  • hasura configure --force flag allows to configure Hasura even if metadata hash matches one saved in database. It may come in handy during development.

  • init --keep-schemas flag makes DipDup preserve contract JSONSchemas. Usually, they are removed after generating typeclasses with datamodel-codegen, but you can keep them to convert to other formats or troubleshoot codegen issues.

Built-in dipdup_head_status view and REST endpoint

DipDup maintains several internal models to keep its state. As Hasura generates GraphQL queries and REST endpoints for those models, you can use them for monitoring. However, some SaaS monitoring solutions can only check whether an HTTP response contains a specific word or not. For such cases dipdup_head_status view was added - a simplified representation of dipdup_head table. It returns OK when datasource received head less than two minutes ago and OUTDATED otherwise. Latter means that something's stuck, either DipDup (e.g., because of database deadlock) or TzKT instance. Or maybe the whole Tezos blockchain, but in that case, you have problems bigger than indexing.

$ curl "http://127.0.0.1:41000/api/rest/dipdupHeadStatus?name=https%3A%2F%2Fapi.tzkt.io" 
{"dipdupHeadStatus":[{"status":"OUTDATED"}]}%

Note that dipdup_head update may be delayed during sync even if the --early-realtime flag is enabled, so don't rely exclusively on this endpoint.

Changelog

Added

  • cli: Added schema init command to initialize database schema.
  • cli: Added --force flag to hasura configure command.
  • codegen: Added support for subpackages inside callback directories.
  • hasura: Added dipdup_head_status view and REST endpoint.
  • index: Added an ability to skip historical data while synchronizing big_map indexes.
  • metadata: Added metadata datasource.
  • tzkt: Added get_big_map and get_contract_big_maps datasource methods.

4.0.0

⚠ Breaking Changes

  • run --oneshot option is removed. The oneshot mode (DipDup stops after the sync is finished) applies automatically when last_level field is set in the index config.
  • clear-cache command is removed. Use cache clear instead.

⚠ Migration from 3.x

  • Run dipdup init command to generate on_synchronized hook stubs.
  • Run dipdup schema approve command on every database you want to use with 4.0.0. Running dipdup migrate is not necessary since spec_version hasn't changed in this release.

What's New

Performance optimizations

Overall indexing performance has been significantly improved. Key highlights:

  • Configuration files are loaded 10x times faster. The more indexes in the project, the more noticeable difference is.
  • Significantly reduced CPU usage in realtime mode.
  • Datasource default HTTP connection options optimized for a reasonable balance between resource consumption and indexing speed.

Also, two new flags were added to improve DipDup performance in several scenarios: merge_subscriptions and early_relatime. See this paragraph for details.

Configurable action on reindex

There are several reasons that trigger reindexing:

reasondescription
manualReindexing triggered manually from callback with ctx.reindex.
migrationApplied migration requires reindexing. Check release notes before switching between major DipDup versions to be prepared.
rollbackReorg message received from TzKT, and can not be processed.
config_modifiedOne of the index configs has been modified.
schema_modifiedDatabase schema has been modified. Try to avoid manual schema modifications in favor of SQL hooks.

Now it is possible to configure desirable action on reindexing triggered by the specific reason.

actiondescription
exception (default)Raise ReindexingRequiredError and quit with error code. The safest option since you can trigger reindexing accidentally, e.g., by a typo in config. Don't forget to set up the correct restart policy when using it with containers.
wipeDrop the whole database and start indexing from scratch. Be careful with this option!
ignoreIgnore the event and continue indexing as usual. It can lead to unexpected side-effects up to data corruption; make sure you know what you are doing.

To configure actions for each reason, add the following section to DipDup config:

advanced:
  ...
  reindex:
    manual: wipe
    migration: exception
    rollback: ignore
    config_modified: exception
    schema_modified: exception

New CLI commands and flags

command or flagdescription
cache showGet information about file caches used by DipDup.
config exportPrint config after resolving all links and variables. Add --unsafe option to substitute environment variables; default values from config will be used otherwise.
run --early-realtimeEstablish a realtime connection before all indexes are synchronized.
run --merge-subscriptionsSubscribe to all operations/big map diffs during realtime indexing. This flag helps to avoid reaching TzKT subscriptions limit (currently 10000 channels). Keep in mind that this option could significantly improve RAM consumption depending on the time required to perform a sync.
statusPrint the current status of indexes from the database.

advanced top-level config section

This config section allows users to tune system-wide options, either experimental or unsuitable for generic configurations.

fielddescription
early_realtime
merge_subscriptions
postpone_jobs
Another way to set run command flags. Useful for maintaining per-deployment configurations.
reindexConfigure action on reindexing triggered. See this paragraph for details.

CLI flags have priority over self-titled AdvancedConfig fields.

aiosignalrcore replaced with pysignalr

It may not be the most noticeable improvement for end-user, but it still deserves a separate paragraph in this article.

Historically, DipDup used our own fork of signalrcore library named aiosignalrcore. This project aimed to replace the synchronous websocket-client library with asyncio-ready websockets. Later we discovered that required changes make it hard to maintain backward compatibility, so we have decided to rewrite this library from scratch. So now you have both a modern and reliable library for SignalR protocol and a much more stable DipDup. Ain't it nice?

Changes since 3.1.3

This is a combined changelog of -rc versions released since the last stable release until this one.

Added

  • cli: Added run --early-realtime flag to establish a realtime connection before all indexes are synchronized.
  • cli: Added'run --merge-subscriptions` flag to subscribe to all operations/big map diffs during realtime indexing.
  • cli: Added status command to print the current status of indexes from the database.
  • cli: Added config export [--unsafe] command to print config after resolving all links and variables.
  • cli: Added cache show command to get information about file caches used by DipDup.
  • config: Added first_level and last_level optional fields to TemplateIndexConfig. These limits are applied after ones from the template itself.
  • config: Added daemon boolean field to JobConfig to run a single callback indefinitely. Conflicts with crontab and interval fields.
  • config: Added advanced top-level section.
  • hooks: Added on_synchronized hook, which fires each time all indexes reach realtime state.

Fixed

  • cli: Fixed config not being verified when invoking some commands.
  • cli: Fixed crashes and output inconsistency when piping DipDup commands.
  • cli: Fixed missing schema approve --hashes argument.
  • cli: Fixed schema wipe --immune flag being ignored.
  • codegen: Fixed contract address used instead of an alias when typename is not set.
  • codegen: Fixed generating callback arguments for untyped operations.
  • codegen: Fixed missing imports in handlers generated during init.
  • coinbase: Fixed possible data inconsistency caused by caching enabled for method get_candles.
  • hasura: Fixed unnecessary reconfiguration in restart.
  • http: Fixed increasing sleep time between failed request attempts.
  • index: Fixed CallbackError raised instead of ReindexingRequiredError in some cases.
  • index: Fixed crash while processing storage of some contracts.
  • index: Fixed incorrect log messages, remove duplicate ones.
  • index: Fixed invocation of head index callback.
  • index: Fixed matching of untyped operations filtered by source field (@pravin-d).
  • tzkt: Fixed filtering of big map diffs by the path.
  • tzkt: Fixed get_originated_contracts and get_similar_contracts methods whose output was limited to HTTPConfig.batch_size field.
  • tzkt: Fixed lots of SignalR bugs by replacing aiosignalrcore library with pysignalr.
  • tzkt: Fixed processing operations with entrypoint default.
  • tzkt: Fixed regression in processing migration originations.
  • tzkt: Fixed resubscribing when realtime connectivity is lost for a long time.
  • tzkt: Fixed sending useless subscription requests when adding indexes in runtime.

Changed

  • cli: schema wipe command now requires confirmation when invoked in the interactive shell.
  • cli: schema approve command now also causes a recalculation of schema and index config hashes.
  • index: DipDup will recalculate respective hashes if reindexing is triggered with config_modified: ignore or schema_modified: ignore in advanced config.

Removed

  • cli: Removed deprecated run --oneshot argument and clear-cache command.

Performance

  • config: Configuration files are loaded 10x times faster.
  • index: Checks performed on each iteration of the main DipDup loop are slightly faster now.
  • index: Number of operations processed by matcher reduced by 40%-95% depending on the number of addresses and entrypoints used.
  • tzkt: Improved performance of response deserialization.
  • tzkt: Rate limit was increased. Try to set connection_timeout to a higher value if requests fail with ConnectionTimeout exception.