____ _ ____
/ __ \ (_)____ / __ \ __ __ ____
/ / / // // __ \ / / / // / / // __ \
/ /_/ // // /_/ // /_/ // /_/ // /_/ /
/_____//_// .___//_____/ \__,_// .___/
/_/ /_/
DipDup is a Python framework for building indexers of Tezos smart-contracts. It helps developers focus on the business logic instead of writing data storing and serving boilerplate. DipDup-based indexers are selective, which means only required data is requested. This approach allows to achieve faster indexing times and decreased load on APIs DipDup uses.
-
Ready to build your first indexer? Head to Quickstart.
-
Looking for examples? Check out Demo Projects.
-
Want to contribute? See Contribution Guide.
This project is maintained by the Baking Bad team. Development is supported by Tezos Foundation.
Quickstart
This page will guide you through the steps to get your first selective indexer up and running in a few minutes without getting too deep into the details.
Let's create an indexer for the tzBTC FA1.2 token contract. Our goal is to save all token transfers to the database and then calculate some statistics of its holders' activity.
A Linux environment with Python 3.10+ installed is required to use DipDup.
Create a new project
From template
Cookiecutter is a cool jinja2
wrapper to initialize hello-world templates of various frameworks and toolkits interactively. Install python-cookiecutter
package systemwide, then call:
cookiecutter https://github.com/dipdup-net/cookiecutter-dipdup
From scratch
We advise using the poetry
package manager for new projects.
poetry init
poetry add dipdup
poetry shell
🤓 SEE ALSO
Write a configuration file
DipDup configuration is stored in YAML files of a specific format. Create a new file named dipdup.yml
in your current working directory with the following content:
spec_version: 1.2
package: demo_tzbtc
database:
kind: sqlite
path: demo_tzbtc.sqlite3
contracts:
tzbtc_mainnet:
address: KT1PWx2mnDueood7fEmfbBDKx1D9BAnnXitn
typename: tzbtc
datasources:
tzkt_mainnet:
kind: tzkt
url: https://api.tzkt.io
indexes:
tzbtc_holders_mainnet:
kind: operation
datasource: tzkt_mainnet
contracts:
- tzbtc_mainnet
handlers:
- callback: on_transfer
pattern:
- destination: tzbtc_mainnet
entrypoint: transfer
- callback: on_mint
pattern:
- destination: tzbtc_mainnet
entrypoint: mint
🤓 SEE ALSO
Initialize project tree
Now it's time to generate typeclasses and callback stubs. Run the following command:
dipdup init
DipDup will create a Python package demo_tzbtc
having the following structure:
demo_tzbtc
├── graphql
├── handlers
│ ├── __init__.py
│ ├── on_mint.py
│ └── on_transfer.py
├── hooks
│ ├── __init__.py
│ ├── on_reindex.py
│ ├── on_restart.py
│ ├── on_index_rollback.py
│ └── on_synchronized.py
├── __init__.py
├── models.py
├── sql
│ ├── on_reindex
│ ├── on_restart
│ ├── on_index_rollback
│ └── on_synchronized
└── types
├── __init__.py
└── tzbtc
├── __init__.py
├── parameter
│ ├── __init__.py
│ ├── mint.py
│ └── transfer.py
└── storage.py
That's a lot of files and directories! But don't worry, we will need only models.py
and handlers
modules in this guide.
🤓 SEE ALSO
Define data models
Our schema will consist of a single model Holder
having several fields:
address
— account addressbalance
— in tzBTCvolume
— total transfer/mint amount bypassedtx_count
— number of transfers/mintslast_seen
— time of the last transfer/mint
Put the following content in the models.py
file:
from tortoise import Model, fields
class Holder(Model):
address = fields.CharField(max_length=36, pk=True)
balance = fields.DecimalField(decimal_places=8, max_digits=20, default=0)
volume = fields.DecimalField(decimal_places=8, max_digits=20, default=0)
tx_count = fields.BigIntField(default=0)
last_seen = fields.DatetimeField(null=True)
🤓 SEE ALSO
Implement handlers
Everything's ready to implement an actual indexer logic.
Our task is to index all the balance updates, so we'll start with a helper method to handle them. Create a file named on_balance_update.py
in the handlers
package with the following content:
from decimal import Decimal
import demo_tzbtc.models as models
async def on_balance_update(
address: str,
balance_update: Decimal,
timestamp: str
) -> None:
holder, _ = await models.Holder.get_or_create(address=address)
holder.balance += balance_update
holder.tx_count += 1
holder.last_seen = timestamp
assert holder.balance >= 0, address
await holder.save()
Three methods of tzBTC contract can alter token balances — transfer
, mint
, and burn
. The last one is omitted in this tutorial for simplicity. Edit corresponding handlers to call the on_balance_update
method with data from matched operations:
on_transfer.py
from typing import Optional
from decimal import Decimal
from dipdup.models import Transaction
from dipdup.context import HandlerContext
import demo_tzbtc.models as models
from demo_tzbtc.types.tzbtc.parameter.transfer import TransferParameter
from demo_tzbtc.types.tzbtc.storage import TzbtcStorage
from demo_tzbtc.handlers.on_balance_update import on_balance_update
async def on_transfer(
ctx: HandlerContext,
transfer: Transaction[TransferParameter, TzbtcStorage],
) -> None:
if transfer.parameter.from_ == transfer.parameter.to:
# NOTE: Internal tzBTC transaction
return
amount = Decimal(transfer.parameter.value) / (10 ** 8)
await on_balance_update(
address=transfer.parameter.from_,
balance_update=-amount,
timestamp=transfer.data.timestamp,
)
await on_balance_update(address=transfer.parameter.to,
balance_update=amount,
timestamp=transfer.data.timestamp)
on_mint.py
from typing import Optional
from decimal import Decimal
from dipdup.models import Transaction
from dipdup.context import HandlerContext
import demo_tzbtc.models as models
from demo_tzbtc.types.tzbtc.parameter.mint import MintParameter
from demo_tzbtc.types.tzbtc.storage import TzbtcStorage
from demo_tzbtc.handlers.on_balance_update import on_balance_update
async def on_mint(
ctx: HandlerContext,
mint: Transaction[MintParameter, TzbtcStorage],
) -> None:
amount = Decimal(mint.parameter.value) / (10 ** 8)
await on_balance_update(
address=mint.parameter.to,
balance_update=amount,
timestamp=mint.data.timestamp
)
And that's all! We can run the indexer now.
🤓 SEE ALSO
Run your indexer
dipdup run
DipDup will fetch all the historical data and then switch to realtime updates. Your application data has been successfully indexed!
🤓 SEE ALSO
Getting started
This part of docs covers the same features Quickstart article does, but more focused on details.
Installation
This page covers the installation of DipDup in different environments.
Host requirements
A Linux environment with Python 3.10 installed is required to use DipDup.
Minimum hardware requirements are 256 MB RAM, 1 CPU core, and some disk space for the database.
Non-Linux environments
Other UNIX-like systems (macOS, FreeBSD, etc.) should work but are not supported officially.
DipDup currently doesn't work in Windows environments due to incompatibilities in libraries it depends on. Please use WSL or Docker.
We aim to improve cross-platform compatibility in future releases.
🤓 SEE ALSO
Local installation
To begin with, create a new directory for your project and enter it. Now choose one way of managing virtual environments:
Poetry (recommended)
Initialize a new PEP 518 project and add DipDip to dependencies.
poetry init
poetry add dipdup
pip
Create a new virtual environment and install DipDup in it.
python -m venv .venv
source .venv/bin/activate
pip install dipdup
Other options
🤓 SEE ALSO
Core concepts
Big picture
DipDup is heavily inspired by The Graph Protocol, but there are several differences:
- DipDup works with operation groups (explicit operation and all internal ones) and Big_map updates (lazy hash map structures) — until fully-fledged events are implemented in Tezos.
- DipDup utilizes a microservice approach and relies heavily on existing solutions, making the SDK very lightweight and allowing it to switch API engines on demand.
Consider DipDup a set of best practices for building custom backends for decentralized applications, plus a toolkit that spares you from writing boilerplate code.
DipDup is tightly coupled with TzKT API but can generally use any data provider which implements a particular feature set. TzKT provides REST endpoints and Websocket subscriptions with flexible filters enabling selective indexing and returns "humanified" contract data, which means you don't have to handle raw Michelson expressions.
DipDup offers PostgreSQL + Hasura GraphQL Engine combo out-of-the-box to expose indexed data via REST and GraphQL with minimal configuration. However, you can use any database and API engine (e.g., write your own API backend).
How it works
From the developer's perspective, there are three main steps for creating an indexer using DipDup framework:
- Write a declarative configuration file containing all the inventory and indexing rules.
- Describe your domain-specific data models.
- Implement the business logic, which is how to convert blockchain data to your models.
As a result, you get a service responsible for filling the database with the indexed data.
Within this service, there can be multiple indexers running independently.
Atomicity and persistency
DipDup applies all updates atomically block by block. In case of an emergency shutdown, it can safely recover later and continue from the level it ended. DipDup state is stored in the database per index and can be used by API consumers to determine the current indexer head.
Here are a few essential things to know before running your indexer:
- Ensure that the database you're connecting to is used by DipDup exclusively. Changes in index configuration or models require DipDup to drop the whole database and start indexing from scratch.
- Do not rename existing indexes in the config file without cleaning up the database first. DipDup won't handle that automatically and will treat the renamed index as new.
- Multiple indexes pointing to different contracts should not reuse the same models (unless you know what you are doing) because synchronization is done sequentially by index.
Schema migration
DipDup does not support database schema migration: if there's any model change, it will trigger reindexing. The rationale is that it's easier and faster to start over than handle migrations that can be of arbitrary complexity and do not guarantee data consistency.
DipDup stores a hash of the SQL version of the DB schema and checks for changes each time you run indexing.
Handling chain reorgs
Reorg messages signaling chain reorganizations. That means some blocks, including all operations, are rolled back in favor of another with higher fitness. Chain reorgs happen regularly (especially in testnets), so it's not something you can ignore. You must handle such messages correctly - otherwise, you will likely accumulate duplicate or invalid data. You can implement your rollback logic by editing the on_index_rollback
hook.
Single level
Single level rollbacks are processed in the following way:
- If the new block has the same subset of operations as the replaced one — do nothing;
- If the new block has all the operations from the replaced one AND several new operations — process those new operations;
- If the new block misses some operations from the replaced one: trigger full reindexing.
Preparing inventory
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
Before starting indexing, you need to set up several things:
- Contracts you want to process with your indexer. See 12.2. contracts for details.
- Datasources used both by DipDup internally and user on demand. See 12.4. datasources for details.
- Indexes. See 12.7. indexes for details.
Project structure
The structure of the DipDup project package is the following:
demo_tzbtc
├── graphql
├── handlers
│ ├── __init__.py
│ ├── on_mint.py
│ └── on_transfer.py
├── hooks
│ ├── __init__.py
│ ├── on_reindex.py
│ ├── on_restart.py
│ ├── on_index_rollback.py
│ └── on_synchronized.py
├── __init__.py
├── models.py
├── sql
│ ├── on_reindex
│ ├── on_restart
│ ├── on_index_rollback
│ └── on_synchronized
└── types
├── __init__.py
└── tzbtc
├── __init__.py
├── parameter
│ ├── __init__.py
│ ├── mint.py
│ └── transfer.py
└── storage.py
path | description |
---|---|
graphql | GraphQL queries for Hasura (*.graphql ) |
handlers | User-defined callbacks to process matched operations and big map diffs |
hooks | User-defined callbacks to run manually or by schedule |
models.py | Tortoise ORM models |
sql | SQL scripts to run from callbacks (*.sql ) |
types | Codegened Pydantic typeclasses for contract storage/parameter |
DipDup will generate all the necessary directories and files inside the project's root on init
command. These include contract type definitions and callback stubs to be implemented by the developer.
Type classes
DipDup receives all smart contract data (transaction parameters, resulting storage, big_map updates) in normalized form (read more about how TzKT handles Michelson expressions) but still as raw JSON. DipDup uses contract type information to generate data classes, which allow developers to work with strictly typed data.
DipDup generates Pydantic models out of JSONSchema. You might want to install additional plugins (PyCharm, mypy) for convenient work with this library.
The following models are created at init
:
operation
indexes: storage type for all contracts met in handler patterns plus parameter type for all destination+entrypoint pairs.big_map
indexes: key and storage types for all big map paths in handler configs.
Nested packages
Callback modules don't have to be in top-level hooks
/handlers
directories. Add one or multiple dots to the callback name to define nested packages:
package: indexer
hooks:
foo.bar:
callback: foo.bar
After running the init
command, you'll get the following directory tree (shortened for readability):
indexer
├── hooks
│ ├── foo
│ │ ├── bar.py
│ │ └── __init__.py
│ └── __init__.py
└── sql
└── foo
└── bar
└── .keep
The same rules apply to handler callbacks. Note that the callback
field must be a valid Python package name - lowercase letters, underscores, and dots.
🤓 SEE ALSO
Templates and variables
Templates allow you to reuse index configuration, e.g., for different networks (mainnet/testnet) or multiple contracts sharing the same codebase.
templates:
my_template:
kind: operation
datasource: <datasource>
contracts:
- <contract1>
handlers:
- callback: callback1
pattern:
- destination: <contract1>
entrypoint: call
Templates have the same syntax as indexes of all kinds; the only difference is that they additionally support placeholders enabling parameterization:
field: <placeholder>
Any string value wrapped in angle brackets is treated as a placeholder, so make sure there are no collisions with the actual values. You can use a single placeholder multiple times.
Any index implementing a template must have a value for each existing placeholder; the exception raised otherwise. Those values are within the handler context at ctx.template_values
.
🤓 SEE ALSO
Defining models
DipDup uses the Tortoise ORM library to cover database operations. During initialization, DipDup generates a models.py
file on the top level of the package that will contain all database models. The name and location of this file cannot be changed.
A typical models.py
file looks like the following:
from tortoise import Tortoise, fields
from tortoise.models import Model
class Event(Model):
id = fields.IntField(pk=True)
name = fields.TextField()
datetime = fields.DatetimeField(null=True)
See the links below to learn how to use this library.
Limitations
There are some limitations introduced to make Hasura GraphQL integration easier.
- Table names must be in
snake_case
- Model fields must be in
snake_case
- Model fields must differ from table name
🤓 SEE ALSO
Implementing handlers
DipDup generates a separate file with a callback stub for each handler in every index specified in the configuration file.
In the case of the transaction
handler, the callback method signature is the following:
from <package>.types.<typename>.parameter.<entrypoint_1> import EntryPoint1Parameter
from <package>.types.<typename>.parameter.<entrypoint_n> import EntryPointNParameter
from <package>.types.<typename>.storage import TypeNameStorage
async def on_transaction(
ctx: HandlerContext,
entrypoint_1: Transaction[EntryPoint1Parameter, TypeNameStorage],
entrypoint_n: Transaction[EntryPointNParameter, TypeNameStorage]
) -> None:
...
where:
entrypoint_1 ... entrypoint_n
are items from the according to handler pattern.ctx: HandlerContext
provides useful helpers and contains an internal state (see ).- A
Transaction
model contains transaction typed parameter and storage, plus other fields.
For the origination case, the handler signature will look similar:
from <package>.types.<typename>.storage import TypeNameStorage
async def on_origination(
ctx: HandlerContext,
origination: Origination[TypeNameStorage],
)
An Origination
model contains the origination script, initial storage (typed), amount, delegate, etc.
A Big_map update handler will look like the following:
from <package>.types.<typename>.big_map.<path>_key import PathKey
from <package>.types.<typename>.big_map.<path>_value import PathValue
async def on_update(
ctx: HandlerContext,
update: BigMapDiff[PathKey, PathValue],
)
BigMapDiff
contains action (allocate, update, or remove), nullable key and value (typed).
You can safely change argument names (e.g., in case of collisions).
Naming conventions
Python language requires all module and function names in snake case and all class names in pascal case.
A typical imports section of big_map
handler callback looks like this:
from <package>.types.<typename>.storage import TypeNameStorage
from <package>.types.<typename>.parameter.<entrypoint> import EntryPointParameter
from <package>.types.<typename>.big_map.<path>_key import PathKey
from <package>.types.<typename>.big_map.<path>_value import PathValue
Here typename
is defined in the contract inventory, entrypoint
is specified in the handler pattern, and path
is in the handler config.
DipDup does not automatically handle name collisions. Use import ... as
if multiple contracts have entrypoints that share the same name:
from <package>.types.<typename>.parameter.<entrypoint> import EntryPointParameter as Alias
Advanced usage
In this section, you will find information about advanced DipDup features.
Datasources
Datasources are DipDup connectors to various APIs. TzKT data is used for indexing, other sources are complimentary.
tzkt | tezos-node | coinbase | metadata | ipfs | http | |
---|---|---|---|---|---|---|
Callback context (via ctx.datasources ) | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
DipDup index | ✅* | ❌ | ❌ | ❌ | ❌ | ❌ |
mempool service | ✅* | ✅* | ❌ | ❌ | ❌ | ❌ |
metadata service | ✅* | ❌ | ❌ | ❌ | ❌ | ❌ |
* - required
TzKT
TzKT provides REST endpoints to query historical data and SignalR (Websocket) subscriptions to get realtime updates. Flexible filters allow you to request only data needed for your application and drastically speed up the indexing process.
datasources:
tzkt_mainnet:
kind: tzkt
url: https://api.tzkt.io
TzKT datasource is based on generic HTTP datasource and thus inherits its settings (optional):
datasources:
tzkt_mainnet:
http:
cache: false
retry_count: # retry infinetely
retry_sleep:
retry_multiplier:
ratelimit_rate:
ratelimit_period:
connection_limit: 100
connection_timeout: 60
batch_size: 10000
Also you can wait for several block confirmations before processing the operations, e.g. to mitigate chain reorgs:
datasources:
tzkt_mainnet:
buffer_size: 1 # indexing with single block lag
Tezos node
Tezos RPC is a standard interface provided by the Tezos node. It's not suitable for indexing purposes but used for accessing mempool data and other things that are not available through TzKT.
datasources:
tezos_node_mainnet:
kind: tezos-node
url: https://mainnet-tezos.giganode.io
Coinbase
A connector for Coinbase Pro API. Provides get_candles
and get_oracle_data
methods. It may be useful in enriching indexes of DeFi contracts with off-chain data.
datasources:
coinbase:
kind: coinbase
Please note that Coinbase can't replace TzKT being an index datasource. But you can access it via ctx.datasources
mapping both within handler and job callbacks.
DipDup Metadata
dipdup-metadata is a standalone companion indexer for DipDup written in Go. Configure datasource in the following way:
datasources:
metadata:
kind: metadata
url: https://metadata.dipdup.net
network: mainnet|handzhounet
IPFS
While working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup now has a separate datasource to perform such requests.
datasources:
ipfs:
kind: ipfs
url: https://ipfs.io/ipfs
You can use this datasource within any callback. Output is either JSON or binary data.
ipfs = ctx.get_ipfs_datasource('ipfs')
file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'
file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'
Sending arbitrary requests
DipDup datasources do not cover all available methods of underlying APIs. Let's say you want to fetch protocol of the chain you're currently indexing from TzKT:
tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
method='get',
url='v1/protocols/current',
cache=False,
weigth=1, # ratelimiter leaky-bucket drops
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'
Datasource HTTP connection parameters (ratelimit, backoff, etc.) are applied on every request.
🤓 SEE ALSO
Hooks
Hooks are user-defined callbacks called either from the ctx.fire_hook
method or by scheduler (jobs
config section, we'll return to this topic later).
Let's assume we want to calculate some statistics on-demand to avoid blocking an indexer with heavy computations. Add the following lines to DipDup config:
hooks:
calculate_stats:
callback: calculate_stats
atomic: False
args:
major: bool
depth: int
A couple of things here to pay attention to:
- An
atomic
option defines whether hook callback will be wrapped in a single SQL transaction or not. If this option is set to true main indexing loop will be blocked until hook execution is complete. Some statements likeREFRESH MATERIALIZED VIEW
do not require to be wrapped in transactions, so choosing a value of theatomic
option could decrease the time needed to perform initial indexing. - Values of
args
mapping are used as type hints in a signature of a generated callback. We will return to this topic later in this article.
Now it's time to call dipdup init
. The following files will be created in the project's root:
├── hooks
│ └── calculate_stats.py
└── sql
└── calculate_stats
└── .keep
Content of the generated callback stub:
from dipdup.context import HookContext
async def calculate_stats(
ctx: HookContext,
major: bool,
depth: int,
) -> None:
await ctx.execute_sql('calculate_stats')
By default, hooks execute SQL scripts from the corresponding subdirectory of sql/
. Remove or comment out the execute_sql
call to prevent this. This way, both Python and SQL code may be executed in a single hook if needed.
🤓 SEE ALSO
Default hooks
Every DipDup project has multiple hooks called default; they fire on system-wide events and, like regular hooks, are not linked to any index. Names of those hooks are reserved; you can't use them in config.
on_index_rollback
Fires when TzKT datasource has received a chain reorg message which can't be processed automatically.
If your indexer is stateless, you can just drop DB data saved after to_level
and continue indexing. Otherwize, implement more complex logic. By default, this hook triggers full reindexing.
on_restart
This hook executes right before starting indexing. It allows configuring DipDup in runtime based on data from external sources. Datasources are already initialized at execution and available at ctx.datasources
. You can, for example, configure logging here or add contracts and indexes in runtime instead of from static config.
on_reindex
This hook fires after the database are re-initialized after reindexing (wipe). Helpful in modifying schema with arbitrary SQL scripts before indexing.
on_synchronized
This hook fires when every active index reaches a realtime state. Here you can clear caches internal caches or do other cleanups.
🤓 SEE ALSO
Job scheduler
Jobs are schedules for hooks. In some cases, it may come in handy to have the ability to run some code on schedule. For example, you want to calculate statistics once per hour instead of every time handler gets matched.
Arguments typechecking
DipDup will ensure that arguments passed to the hooks have correct types when possible. CallbackTypeError
exception will be raised otherwise. Values of an args
mapping in a hook config should be either built-in types or __qualname__
of external type like decimal.Decimal
. Generic types are not supported: hints like Optional[int] = None
will be correctly parsed during codegen but ignored on type checking.
See 12.8. jobs for details.
Reindexing
In some cases, DipDup can't proceed with indexing without a full wipe. Several reasons trigger reindexing; some are avoidable, some are not:
reason | description |
---|---|
manual | Reindexing triggered manually from callback with ctx.reindex . |
migration | Applied migration requires reindexing. Check release notes before switching between major DipDup versions to be prepared. |
rollback | Reorg message received from TzKT can not be processed. |
config_modified | One of the index configs has been modified. |
schema_modified | Database schema has been modified. Try to avoid manual schema modifications in favor of SQL scripts. |
It is possible to configure desirable action on reindexing triggered by the specific reason.
action | description |
---|---|
exception (default) | Raise ReindexingRequiredError and quit with error code. The safest option since you can trigger reindexing accidentally, e.g., by a typo in config. Don't forget to set up the correct restart policy when using it with containers. |
wipe | Drop the whole database and start indexing from scratch. Be careful with this option! |
ignore | Ignore the event and continue indexing as usual. It can lead to unexpected side-effects up to data corruption; make sure you know what you are doing. |
To configure actions for each reason, add the following section to the DipDup config:
advanced:
...
reindex:
manual: wipe
migration: exception
rollback: ignore
config_modified: exception
schema_modified: exception
Feature flags
Feature flags allow users to modify some system-wide tunables that affect the behavior of the whole framework. These options are either experimental or unsuitable for generic configurations.
run command option | config path | is stable |
---|---|---|
--early-realtime | advanced.early_realtime | ✅ |
--merge-subscriptions | advanced.merge_subscriptions | ✅ |
--postpone-jobs | advanced.postpone_jobs | ✅ |
--metadata-interface | advanced.metadata_interface | ✅ |
advanced.skip-version-check | ✅ |
A good practice is to use set feature flags in environment-specific config files.
Early realtime
By default, DipDup enters a sync state twice: before and after establishing a realtime connection. This flag allows starting collecting realtime messages while sync is in progress, right after indexes load.
Let's consider two scenarios:
-
Indexing 10 contracts with 10 000 operations each. Initial indexing could take several hours. There is no need to accumulate incoming operations since resync time after establishing a realtime connection depends on the contract number, thus taking a negligible amount of time.
-
Indexing 10 000 contracts with 10 operations each. Both initial sync and resync will take a while. But the number of operations received during this time won't affect RAM consumption much.
If you do not have strict RAM constraints, it's recommended to enable this flag. You'll get faster indexing times and decreased load on TzKT API.
Merge subscriptions
Subscribe to all operations/big map diffs during realtime indexing instead of separate channels. This flag helps to avoid the 10.000 subscription limit of TzKT and speed up processing. The downside is an increased RAM consumption during sync, especially if early_realtimm
flag is enabled too.
Postpone jobs
Do not start the job scheduler until all indexes are synchronized. If your jobs perform some calculations that make sense only after indexing is fully finished, this toggle can save you some IOPS.
Metadata interface
Without this flag calling ctx.update_contract_metadata
and ctx.update_token_metadata
will make no effect. Corresponding internal tables are created on reindex in any way.
Skip version check
Disables warning about running unstable or out-of-date DipDup version.
Executing SQL scripts
Put your *.sql
scripts to <package>/sql
. You can run these scripts from any callback with ctx.execute_sql('name')
. If name
is a directory, each script it contains will be executed.
Both types of scripts are executed without being wrapped with SQL transactions. It's generally a good idea to avoid touching table data in scripts;
SQL scripts are ignored if SQLite is used as a database backend.
By default, an empty sql/<hook_name>
directory is generated for every hook in config during init. Comment out execute_sql
in hook code to avoid executing them.
Default hooks
Scripts from sql/on_restart
directory are executed each time you run DipDup. Those scripts may contain CREATE OR REPLACE VIEW
or similar non-destructive operations.
Scripts from sql/on_reindex
directory are executed after the database schema is created based on the models.py
module but before_ indexing starts. It may be useful to change the database schema in the ways that are not supported by the Tortoise ORM, e.g., to create a composite primary key;
Improving performance
This page contains tips that may help to increase indexing speed.
Optimize database schema
Postgres indexes are tables that Postgres can use to speed up data lookup. A database index acts like a pointer to data in a table, just like an index in a printed book. If you look in the index first, you will find the data much quicker than searching the whole book (or — in this case — database).
You should add indexes on columns often appearing in `WHERE`` clauses in your GraphQL queries and subscriptions.
Tortoise ORM uses BTree indexes by default. To set index on a field, add index=True
to the field definition:
from tortoise import Model, fields
class Trade(Model):
id = fields.BigIntField(pk=True)
amount = fields.BigIntField()
level = fields.BigIntField(index=True)
timestamp = fields.DatetimeField(index=True)
Tune datasources
All datasources now share the same code under the hood to communicate with underlying APIs via HTTP. Configs of all datasources and also Hasura's one can have an optional section http
with any number of the following parameters set:
datasources:
tzkt:
kind: tzkt
...
http:
cache: True
retry_count: 10
retry_sleep: 1
retry_multiplier: 1.2
ratelimit_rate: 100
ratelimit_period: 60
connection_limit: 25
batch_size: 10000
hasura:
url: http://hasura:8080
http:
...
field | description |
---|---|
cache | Whether to cache responses |
retry_count | Number of retries after request failed before giving up |
retry_sleep | Sleep time between retries |
retry_multiplier | Multiplier for sleep time between retries |
ratelimit_rate | Number of requests per period ("drops" in leaky bucket) |
ratelimit_period | Period for rate limiting in seconds |
connection_limit | Number of simultaneous connections |
connection_timeout | Connection timeout in seconds |
batch_size | Number of items fetched in a single paginated request (for some APIs) |
Each datasource has its defaults. Usually, there's no reason to alter these settings unless you use self-hosted instances of TzKT or other datasource.
By default, DipDup retries failed requests infinitely, exponentially increasing the delay between attempts. Set retry_count
parameter to limit the number of attempts.
batch_size
parameter is TzKT-specific. By default, DipDup limit requests to 10000 items, the maximum value allowed on public instances provided by Baking Bad. Decreasing this value will reduce the time required for TzKT to process a single request and thus reduce the load. By reducing the connection_limit
parameter, you can achieve the same effect (limited to synchronizing multiple indexes concurrently).
🤓 SEE ALSO
See 12.4. datasources for details.
Use TimescaleDB for time-series
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
DipDup is fully compatible with TimescaleDB. Try its "continuous aggregates" feature, especially if dealing with market data like DEX quotes.
Cache commonly used models
If your indexer contains models having few fields and used primarily on relations, you can cache such models during synchronization.
Example code:
class Trader(Model):
address = fields.CharField(36, pk=True)
class TraderCache:
def __init__(self, size: int = 1000) -> None:
self._size = size
self._traders: OrderedDict[str, Trader] = OrderedDict()
async def get(self, address: str) -> Trader:
if address not in self._traders:
# NOTE: Already created on origination
self._traders[address], _ = await Trader.get_or_create(address=address)
if len(self._traders) > self._size:
self._traders.popitem(last=False)
return self._traders[address]
trader_cache = TraderCache()
Use trader_cache.get
in handlers. After sync is complete, you can clear this cache to free some RAM:
async def on_synchronized(
ctx: HookContext,
) -> None:
...
models.trader_cache.clear()
Callback context (ctx
)
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
An instance of the HandlerContext
class is passed to every handler providing a set of helper methods and read-only properties.
.reindex() -> None
Drops the entire database and starts the indexing process from scratch. on_index_rollback
hook calls this helper by default.
.add_contract(name, address, typename) -> Coroutine
Add a new contract to the inventory.
.add_index(name, template, values) -> Coroutine
Add a new index to the current configuration.
.fire_hook(name, wait=True, **kwargs) -> None
Trigger hook execution. Unset wait
to execute hook outside of the current database transaction.
.execute_sql(name) -> None
The execute_sql
argument could be either name of SQL script in sql
directory or an absolute/relative path. If the path is a directory, all .sql
scripts within it will be executed in alphabetical order.
.update_contract_metadata(network, address, token_id, metadata) -> None
Inserts or updates the corresponding row in the service dipdup_contract_metadata
table used for exposing the 5.11 Metadata interface
.update_token_metadata(network, address, token_id, metadata) -> None
Inserts or updates the corresponding row in the service dipdup_token_metadata
table used for exposing the 5.11 Metadata interface
.logger
Use this instance for logging.
.template_values
You can access values used for initializing a template index.
- class dipdup.context.DipDupContext(datasources: Dict[str, dipdup.datasources.datasource.Datasource], config: dipdup.config.DipDupConfig, callbacks: dipdup.context.CallbackManager)¶
Class to store application context
- Parameters
datasources – Mapping of available datasources
config – DipDup configuration
callbacks – Low-level callback interface (intented for internal use)
logger – Context-aware logger instance
- async execute_sql(name: str) None ¶
Execute SQL script with given name
- Parameters
name – SQL script name within <project>/sql directory
- async fire_handler(name: str, index: str, datasource: dipdup.datasources.tzkt.datasource.TzktDatasource, fmt: Optional[str] = None, *args, **kwargs: Any) None ¶
Fire handler with given name and arguments.
- Parameters
name – Handler name
index – Index name
datasource – An instance of datasource that triggered the handler
fmt – Format string for ctx.logger messages
- async fire_hook(name: str, fmt: Optional[str] = None, wait: bool = True, *args, **kwargs: Any) None ¶
Fire hook with given name and arguments.
- Parameters
name – Hook name
fmt – Format string for ctx.logger messages
wait – Wait for hook to finish or fire and forget
- async reindex(reason: Optional[Union[str, dipdup.enums.ReindexingReason]] = None, **context) None ¶
Drop the whole database and restart with the same CLI arguments
- async restart() None ¶
Restart indexer preserving CLI arguments
- class dipdup.context.HandlerContext(datasources: Dict[str, dipdup.datasources.datasource.Datasource], config: dipdup.config.DipDupConfig, callbacks: dipdup.context.CallbackManager, logger: dipdup.utils.FormattedLogger, handler_config: dipdup.config.HandlerConfig, datasource: dipdup.datasources.tzkt.datasource.TzktDatasource)¶
Common handler context.
- class dipdup.context.HookContext(datasources: Dict[str, dipdup.datasources.datasource.Datasource], config: dipdup.config.DipDupConfig, callbacks: dipdup.context.CallbackManager, logger: dipdup.utils.FormattedLogger, hook_config: dipdup.config.HookConfig)¶
Hook callback context.
- class dipdup.context.TemplateValuesDict(ctx, **kwargs)¶
Internal models
model | table | description |
---|---|---|
dipdup.models.Schema | dipdup_schema | Hash of database schema to detect changes that require reindexing. |
dipdup.models.Index | dipdup_index | Indexing status, level of the latest processed block, template, and template values if applicable. Relates to Head when status is REALTIME (see dipdup.models.IndexStatus for possible values of status field) |
dipdup.models.Head | dipdup_head | The latest block received by a datasource from a WebSocket connection. |
dipdup.models.Contract | dipdup_contract | Nothing useful for us humans. It helps DipDup to keep track of dynamically spawned contracts. A Contract with the same name from the config takes priority over one from this table if {any, exists, provided?}. |
With the help of these tables, you can set up monitoring of DipDup deployment to know when something goes wrong:
SELECT NOW() - timestamp FROM dipdup_head;
Spawning indexes at runtime
DipDup allows spawning new indexes from a template in runtime. There are two ways to do that:
- From another index (e.g., handling factory originations)
- In
on_configure
hook
⚠ WARNING
DipDup is currently not able to automatically generate types and handlers for template indexes unless there is at least one static instance.
DipDup exposes several context methods that extend the current configuration with new contracts and template instances. See 5.8. Handler context for details.
See 12.13. templates for details.
Scheduler configuration
DipDup utilizes apscheduler
library to run hooks according to schedules in jobs
config section. In the following example, apscheduler
will spawn up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:
advanced:
scheduler:
apscheduler.job_defaults.coalesce: True
apscheduler.job_defaults.max_instances: 3
See apscheduler
docs for details.
Note that you can't use executors from apscheduler.executors.pool
module - ConfigurationError
exception will be raised.
Metadata Interface
When issuing a token on Tezos blockchain, there is an important yet not enough covered aspect related to how various ecosystem applications (wallets, explorers, marketplaces, and others) will display and interact with it. It's about token metadata, stored wholly or partially on-chain but intended for off-chain use only.
Token metadata standards
There are several standards regulating the metadata file format and the way it can be stored and exposed to consumers:
- TZIP-21 | Rich Metadata — describes a metadata schema and standards for contracts and tokens
- TZIP-12 | FA2.0 — a standard for a unified token contract interface, includes an article about how to store and encode token metadata
- TZIP-7 | FA1.2 — single asset token standard; reuses the token metadata approach from FA2.0
Keeping aside the metadata schema, let's focus on which approaches for storing are currently standardized, their pros and cons, and what to do if any of the options available do not fit your case.
Basic: on-chain links / off-chain storage
The most straightforward approach is to store everything in the contract storage, especially if it's just the basic fields (name, symbol, decimals):
storage
└── token_metadata [big_map]
└── 0
├── token_id: 0
└── token_info
├── name: ""
├── symbol: ""
└── decimals: ""
But typically, you want to store more like a token thumbnail icon, and it is no longer feasible to keep such large data on-chain (because you pay gas for every byte stored).
Then you can put large files somewhere off-chain (e.g., IPFS) and store just links:
storage
└── token_metadata [big_map]
└── 0
├── token_id: 0
└── token_info
├── ...
└── thumbnailUri: "ipfs://"
This approach is still costly, but sometimes (in rare cases), you need to have access to the metadata from the contract (example: Dogami).
We can go further and put the entire token info structure to IPFS:
storage
└── token_metadata [big_map]
└── 0
├── token_id: 0
└── token_info
└── "": "ipfs://"
It is the most common case right now (example: HEN).
The main advantage of the basic approach is that all the changes applied to token metadata will result in big map diffs that are easily traceable by indexers. Even if you decide to replace the off-chain file, it will cause the IPFS link to change. In the case of HTTP links, indexers cannot detect the content change; thus, token metadata won't be updated.
Custom: off-chain view
The second approach presented in the TZIP-12 spec was intended to cover the cases when there's a need in reusing the same token info or when it's not possible to expose the %token_metadata
big map in the standard form. Instead, it's offered to execute a special Michelson script against the contract storage and treat the result as the token info for a particular token (requested). The tricky part is that the script code itself is typically stored off-chain, and the whole algorithm would look like this:
- Try to fetch the empty string key of the
%metadata
big map to retrieve the TZIP-16 file location - Resolve the TZIP-16 file (typically from IPFS) — it should contain the off-chain view body
- Fetch the current contract storage
- Build arguments for the off-chain view
token_metadata
using fetched storage and - Execute the script using Tezos node RPC
Although this approach is more or less viable for wallets (when you need to fetch metadata for a relatively small amount of tokens), it becomes very inefficient for indexers dealing with millions of tokens:
- After every contract origination, one has to try to fetch the views (even if there aren't any) — it means synchronous fetching, which can take seconds in the case of IPFS
- Executing a Michelson script is currently only* possible via Tezos node, and it's quite a heavy call (setting up the VM and contract context takes time)
- There's no clear way to detect new token metadata addition or change — that is actually the most critical one; you never know for sure when to call the view
Off-chain view approach is not supported by TzKT indexer, and we strongly recommend not to use it, especially for contracts that can issue multiple tokens.
DipDup-based solution
The alternative we offer for the very non-standard cases is using our selective indexing framework for custom token metadata retrieval and then feeding it back to the TzKT indexer, which essentially acts as a metadata aggregator. Note that while this can seem like a circular dependency, it's resolved on the interface level: all custom DipDup metadata indexers should expose specific GraphQL tables with certain fields:
query MyQuery {
token_metadata() {
metadata // TZIP-21 JSON
network // mainnet or <protocol>net
contract // token contract address
token_id // token ID in the scope of the contract
update_id // integer cursor used for pagination
}
}
DipDup handles table management for you and exposes a context-level helper.
Tezos Domains example:
await ctx.update_token_metadata(
network=ctx.datasource.network,
address=store_records.data.contract_address,
token_id=token_id,
metadata={
'name': record_name,
'symbol': 'TD',
'decimals': '0',
'isBooleanAmount': True,
'domainData': decode_domain_data(store_records.value.data)
},
)
TzKT can be configured to subscribe to one or multiple DipDup metadata sources, currently we use in production:
- Generic TZIP-16/TZIP-12 metadata indexer Github | Playground
- Tezos Domains metadata indexer Github | Playground
- Ubisoft Quartz metadata indexer Github | Playground
GraphQL API
In this section, we assume you use Hasura GraphQL Engine integration to power your API.
Before starting to do client integration, it's good to know the specifics of Hasura GraphQL protocol implementation and the general state of the GQL ecosystem.
Queries
By default, Hasura generates three types of queries for each table in your schema:
- Generic query enabling filters by all columns
- Single item query (by primary key)
- Aggregation query (can be disabled)
All the GQL features such as fragments, variables, aliases, directives are supported, as well as batching.
Read more in Hasura docs.
It's important to understand that GraphQL query is just a POST request with JSON payload, and in some instances, you don't need a complicated library to talk to your backend.
Pagination
By default, Hasura does not restrict the number of rows returned per request, which could lead to abuses and heavy load to your server. You can set up limits in the configuration file. See 12.5. hasura for details. But then you will face the need to paginate over the items if the response does not fit into the limits.
Subscriptions
From Hasura documentation:
Hasura GraphQL engine subscriptions are live queries, i.e., a subscription will return the latest result of the query and not necessarily all the individual events leading up to it.
This feature is essential to avoid complex state management (merging query results and subscription feed). In most scenarios, live queries are what you need to sync the latest changes from the backend.
⚠ WARNING
If the live query has a significant response size that does not fit into the limits, you need one of the following:
- Paginate with offset (which is not convenient)
- Use cursor-based pagination (e.g., by an increasing unique id).
- Narrow down request scope with filtering (e.g., by timestamp or level).
Ultimately you can get "subscriptions" on top of live quires by requesting all the items having ID greater than the maximum existing or all the items with a timestamp greater than now.
Websocket transport
Hasura is compatible with subscriptions-transport-ws library, which is currently deprecated by still used by the majority of the clients.
Mutations
The purpose of DipDup is to create indexers, which means you can consistently reproduce the state as long as data sources are accessible. It makes your backend "stateless" in a sense because it's tolerant of data loss.
However, you might need to introduce a non-recoverable state and mix indexed and user-generated content in some cases. DipDup allows marking these UGC tables "immune", protecting them from being wiped. In addition to that, you will need to set up Hasura Auth and adjust write permissions for the tables (by default, they are read-only).
Lastly, you will need to execute GQL mutations to modify the state from the client side. Read more about how to do that with Hasura.
Hasura integration
This optional section used by DipDup executor to automatically configure Hasura engine to track your tables.
hasura:
url: http://hasura:8080
admin_secret: ${HASURA_ADMIN_SECRET:-changeme}
Under the hood, DipDup generates Hasura metadata from your DB schema and applies it using Metadata API.
Hasura metadata is all about data representation in GraphQL API. The structure of the database itself is managed solely by Tortoise ORM.
Metadata configuration is idempotent: each time you call run
or hasura configure
command, DipDup queries the existing schema and does the merge if required. DipDup configures Hasura after reindexing, saves the hash of resulting metadata in the dipdup_schema
table, and doesn't touch Hasura until needed.
Database limitations
The current version of Hasura GraphQL Engine treats public
and other schemas differently. Table schema.customer
becomes schema_customer
root field (or schemaCustomer
if camel_case
option is enabled in DipDup config). Table public.customer
becomes customer
field, without schema prefix. There's no way to remove this prefix for now. You can track related issue (opens new window)at Hasura's GitHub to know when the situation will change. Since 3.0.0-rc1, DipDup enforces public
schema name to avoid ambiguity and issues with the GenQL library. You can still use any schema name if Hasura integration is not enabled.
Authentication
DipDup sets READ only permissions for all tables and enables non-authorized access to the /graphql
endpoint.
Limit number of rows
DipDup creates user
role allowed to perform queries without authorization. Now you can limit the maximum number of rows such queries return and also disable aggregation queries automatically generated by Hasura:
hasura:
select_limit: 100
Note that with limits enabled, you have to use either offset or cursor-based pagination on the client-side.
Disable aggregation queries
hasura:
allow_aggregations: False
Convert field names to camel case
For those of you from JavaScript world, it may be more familiar to use camelCase for variable names instead of snake_case Hasura uses by default. DipDup now allows to convert all fields in metadata to this casing:
hasura:
camel_case: true
Now this example query to hic et nunc demo indexer...
query MyQuery {
hic_et_nunc_token(limit: 1) {
id
creator_id
}
}
...will become this one:
query MyQuery {
hicEtNuncToken(limit: 1) {
id
creatorId
}
}
All fields auto generated by Hasura will be renamed accordingly: hic_et_nunc_token_by_pk
to hicEtNuncTokenByPk
, delete_hic_et_nunc_token
to deleteHicEtNuncToken
and so on. To return to defaults, set camel_case
to False and run hasura configure --force
.
Keep in mind that "camelcasing" is a separate stage performed after all tables are registered. So during configuration, you can observe fields in snake_case
for several seconds even if hasura.camel_case
flag is set.
🤓 SEE ALSO
REST endpoints
Hasura 2.0 introduced the ability to expose arbitrary GraphQL queries as REST endpoints. By default, DipDup will generate GET and POST endpoints to fetch rows by primary key for all tables:
curl http://127.0.0.1:8080/api/rest/hicEtNuncHolder?address=tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw
{
"hicEtNuncHolderByPk": {
"address": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw"
}
}
However, there's a limitation dictated by how Hasura parses HTTP requests: only models with primary keys of basic types (int, string, and so on) can be fetched with GET requests. An attempt to fetch model with BIGINT primary key will lead to the error: Expected bigint for variable id got Number
.
A workaround to fetching any model is to send a POST request containing a JSON payload with a single key:
curl -d '{"id": 152}' http://127.0.0.1:8080/api/rest/hicEtNuncToken
{
"hicEtNuncTokenByPk": {
"creatorId": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw",
"id": 152,
"level": 1365242,
"supply": 1,
"timestamp": "2021-03-01T03:39:21+00:00"
}
}
We hope to get rid of this limitation someday and will let you know as soon as it happens.
Custom endpoints
You can put any number of .graphql
files into graphql
directory in your project's root, and DipDup will create REST endpoints for each of those queries. Let's say we want to fetch not only a specific token, but also the number of all tokens minted by its creator:
query token_and_mint_count($id: bigint) {
hicEtNuncToken(where: {id: {_eq: $id}}) {
creator {
address
tokens_aggregate {
aggregate {
count
}
}
}
id
level
supply
timestamp
}
}
Save this query as graphql/token_and_mint_count.graphql
and run dipdup configure-hasura
. Now, this query is available via REST endpoint at http://127.0.0.1:8080/api/rest/token_and_mint_count
.
You can disable exposing of REST endpoints in the config:
hasura:
rest: False
GenQL
GenQL is a great library and CLI tool that automatically generates a fully typed SDK with a built-in GQL client. It works flawlessly with Hasura and is recommended for DipDup on the client-side.
Project structure
GenQL CLI generates a ready-to-use package, compiled and prepared to publish to NPM. A typical setup is a mono repository containing several packages, including the auto-generated SDK and your front-end application.
project_root/
├── package.json
└── packages/
├── app/
│ ├── package.json
│ └── src/
└── sdk/
└── package.json
SDK package config
Your minimal package.json file will look like the following:
{
"name": "%PACKAGE_NAME%",
"version": "0.0.1",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"devDependencies": {
"@genql/cli": "^2.6.0"
},
"dependencies": {
"@genql/runtime": "2.6.0",
"graphql": "^15.5.0"
},
"scripts": {
"build": "genql --endpoint %GRAPHQL_ENDPOINT% --output ./dist"
}
}
That's it! Now you only need to install dependencies and execute the build target:
yarn
yarn build
Read more about CLI options available.
Demo
Create a package.json
file with
%PACKAGE_NAME%
=>metadata-sdk
%GRAPHQL_ENDPOINT%
=>https://metadata.dipdup.net/v1/graphql
And generate the client:
yarn
yarn build
Then create new file index.ts
and paste this query:
import { createClient, everything } from './dist'
const client = createClient()
client.chain.query
.token_metadata({ where: { network: { _eq: 'mainnet' } }})
.get({ ...everything })
.then(res => console.log(res))
We need some additional dependencies to run our sample:
yarn add typescript ts-node
Finally:
npx ts-node index.ts
You should see a list of tokens with metadata attached in your console.
Troubleshooting
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
Common issues
MigrationRequiredError
Reason
DipDup was updated to release which spec_version
differs from the value in the config file. You need to perform an automatic migration before starting indexing again.
Solution
- Run
dipdup migrate
command. - Review and commit changes.
ReindexingRequiredError
Reason
There can be several possible reasons that require reindexing from scratch:
- Your db models or your config (thus likely handler) changed, it means that all the previous data is probably not correct or will be inconsistent with the new one. Of course, you handle that manually or write a migration — luckily, there is a way to disable reindexing for such cases.
- Also, DipDup internal models or some raw indexing mechanisms changed (e.g., a serious bug was fixed), and, unfortunetely, it is required to re-run the indexer. Sometimes those changes do not affect your particular case, and you can skip the reindexing part.
- Finally, there are chain reorgs happening from time to time, and if you don't have your
on_index_rollback
handler implemented — be ready for those errors. Luckily there is a generic approach to mitigate that — just wait for another block before applying the previous one, i.e., introduce a lag into the indexing process.
Solution
You can set how to react in each of the cases described. Here's an example setup:
advanced:
reindex:
manual: exception
migration: exception
rollback: exception
config_modified: ignore
schema_modified: ignore
To index with a lag, add this TzKT datasource preference:
datasources:
tzkt_mainnet:
kind: tzkt
url: ${TZKT_URL:-https://api.tzkt.io}
buffer_size: 1 # <--- one level reorgs are most common, 2-level reorgs are super rare
Reporting bugs
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
Deployment and operations
This section contains recipes to deploy and maintain DipDup instances.
Database engines
DipDup officially supports the following databases: SQLite, PostgreSQL, TimescaleDB. This page will help you choose a database engine that mostly suits your needs.
SQLite | PostgreSQL | TimescaleDB | |
---|---|---|---|
Supported versions | any | any | any |
When to use | early development | general usage | working with timeseries |
Performance | good | better | great in some scenarios |
SQL scripts | ❌ | ✅ | ✅ |
Immune tables* | ❌ | ✅ | ✅ |
Hasura integration | ❌ | ✅** | ✅** |
* — see immune_tables
config reference for details.
** — schema name must be public
While sometimes it's convenient to use one database engine for development and another one for production, be careful with specific column types that behave differently in various engines.
Building Docker images
FROM dipdup/dipdup:5.0.0
# Uncomment if you have an additional dependencies in pyproject.toml
# COPY pyproject.toml poetry.lock ./
# RUN inject_pyproject
COPY indexer indexer
COPY dipdup.yml dipdup.prod.yml ./
Docker compose
Make sure you have docker run and docker-compose installed.
Example docker-compose.yml
file:
version: "3.8"
services:
indexer:
build: .
depends_on:
- db
command: ["-c", "dipdup.yml", "-c", "dipdup.prod.yml", "run"]
restart: "no"
environment:
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
- ADMIN_SECRET=${ADMIN_SECRET:-changeme}
volumes:
- ./dipdup.yml:/home/dipdup/dipdup.yml
- ./dipdup.prod.yml:/home/dipdup/dipdup.prod.yml
- ./indexer:/home/dipdup/indexer
ports:
- 127.0.0.1:9000:9000
db:
image: timescale/timescaledb:latest-pg13
ports:
- 127.0.0.1:5432:5432
volumes:
- db:/var/lib/postgresql/data
environment:
- POSTGRES_USER=dipdup
- POSTGRES_DB=dipdup
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
deploy:
mode: replicated
replicas: 1
hasura:
image: hasura/graphql-engine:v2.4.0
ports:
- 127.0.0.1:8080:8080
depends_on:
- db
restart: always
environment:
- HASURA_GRAPHQL_DATABASE_URL=postgres://dipdup:${POSTGRES_PASSWORD:-changeme}@db:5432/dipdup
- HASURA_GRAPHQL_ENABLE_CONSOLE=true
- HASURA_GRAPHQL_DEV_MODE=true
- HASURA_GRAPHQL_ENABLED_LOG_TYPES=startup, http-log, webhook-log, websocket-log, query-log
- HASURA_GRAPHQL_ADMIN_SECRET=${ADMIN_SECRET:-changeme}
- HASURA_GRAPHQL_UNAUTHORIZED_ROLE=user
- HASURA_GRAPHQL_STRINGIFY_NUMERIC_TYPES=true
volumes:
db:
Environment variables are expanded in the DipDup config file; Postgres password and Hasura secret are forwarded in this example.
Create a separate dipdup.<environment>.yml
file for this stack:
database:
kind: postgres
host: db
port: 5432
user: dipdup
password: ${POSTGRES_PASSWORD:-changeme}
database: dipdup
schema_name: demo
hasura:
url: http://hasura:8080
admin_secret: ${ADMIN_SECRET:-changeme}
allow_aggregations: False
camel_case: true
select_limit: 100
Note the hostnames (resolved in the docker network) and environment variables (expanded by DipDup).
Build and run the containers:
docker-compose up -d --build
We recommend lazydocker for monitoring your application.
Deploying with Docker Swarm
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
Sentry integration
Sentry is an error tracking software that can be used either as a service or on-premise. It dramatically improves the troubleshooting experience and requires nearly zero configuration. To start catching exceptions with Sentry in your project, add the following section in dipdup.yml
config:
sentry:
dsn: https://...
environment: dev
debug: False
You can obtain Sentry DSN from the web interface at Settings -> Projects -> <project_name> -> Client Keys (DSN). The cool thing is that if you catch an exception and suspect there's a bug in DipDup, you can share this event with us using a public link (created at Share menu).
Prometheus integration
Available metrics
The following metrics will be exposed:
metric name | description |
---|---|
dipdup_indexes_total | Number of indexes in operation by status |
dipdup_index_level_sync_duration_seconds | Duration of indexing a single level |
dipdup_index_level_realtime_duration_seconds | Duration of last index syncronization |
dipdup_index_total_sync_duration_seconds | Duration of the last index syncronization |
dipdup_index_total_realtime_duration_seconds | Duration of the last index realtime syncronization |
dipdup_index_levels_to_sync_total | Number of levels to reach synced state |
dipdup_index_levels_to_realtime_total | Number of levels to reach realtime state |
dipdup_index_handlers_matched_total | Index total hits |
dipdup_datasource_head_updated_timestamp | Timestamp of the last head update |
dipdup_datasource_rollbacks_total | Number of rollbacks |
dipdup_http_errors_total | Number of http errors |
dipdup_callback_duration_seconds | Duration of callback execution |
Logging
Currently, you have two options to configure logging:
- Manually in
on_restart
hook
import logging
async def on_restart(
ctx: HookContext,
) -> None:
logging.getLogger('dipdup').setLevel('DEBUG')
- With Python logging config
⚠ WARNING
This feature will be deprecated soon. Consider configuring logging inside of
on_restart
hook.
dipdup -l logging.yml run
Example config:
version: 1
disable_existing_loggers: false
formatters:
brief:
format: "%(levelname)-8s %(name)-20s %(message)s"
handlers:
console:
level: INFO
formatter: brief
class: logging.StreamHandler
stream: ext://sys.stdout
loggers:
dipdup:
level: INFO
aiosqlite:
level: INFO
db_client:
level: INFO
root:
level: INFO
handlers:
- console
Monitoring
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
Backup and restore
DipDup has no built-in functionality to backup and restore database at the moment. Good news is that DipDup indexes are fully atomic. That means you can perform backup with regular psql
/pgdump
regardless of the DipDup state.
This page contains several recipes for backup/restore.
Scheduled backup to S3
This example is for Swarm deployments. We use this solution to backup our services in production. Adapt it to your needs if needed.
version: "3.8"
services:
indexer:
...
db:
...
hasura:
...
backuper:
image: ghcr.io/dipdup-net/postgres-s3-backup:master
environment:
- S3_ENDPOINT=${S3_ENDPOINT:-https://fra1.digitaloceanspaces.com}
- S3_ACCESS_KEY_ID=${S3_ACCESS_KEY_ID}
- S3_SECRET_ACCESS_KEY=${S3_SECRET_ACCESS_KEY}
- S3_BUCKET=dipdup
- S3_PATH=dipdup
- S3_FILENAME=${SERVICE}-postgres
- PG_BACKUP_FILE=${PG_BACKUP_FILE}
- PG_BACKUP_ACTION=${PG_BACKUP_ACTION:-dump}
- PG_RESTORE_JOBS=${PG_RESTORE_JOBS:-8}
- POSTGRES_USER=${POSTGRES_USER:-dipdup}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
- POSTGRES_DB=${POSTGRES_DB:-dipdup}
- POSTGRES_HOST=${POSTGRES_HOST:-db}
- HEARTBEAT_URI=${HEARTBEAT_URI}
- SCHEDULE=${SCHEDULE}
deploy:
mode: replicated
replicas: ${BACKUP_ENABLED:-0}
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 5
window: 120s
placement: *placement
networks:
- internal
logging: *logging
Automatic restore on rollback
This awesome code was contributed by @852Kerfunkle, author of tz1and project.
<project>/backups.py
...
def backup(level: int, database_config: PostgresDatabaseConfig):
...
with open('backup.sql', 'wb') as f:
try:
err_buf = StringIO()
pg_dump('-d', f'postgresql://{database_config.user}:{database_config.password}@{database_config.host}:{database_config.port}/{database_config.database}', '--clean',
'-n', database_config.schema_name, _out=f, _err=err_buf) #, '-E', 'UTF8'
except ErrorReturnCode:
err = err_buf.getvalue()
_logger.error(f'Database backup failed: {err}')
def restore(level: int, database_config: PostgresDatabaseConfig):
...
with open('backup.sql', 'r') as f:
try:
err_buf = StringIO()
psql('-d', f'postgresql://{database_config.user}:{database_config.password}@{database_config.host}:{database_config.port}/{database_config.database}',
'-n', database_config.schema_name, _in=f, _err=err_buf)
except ErrorReturnCode:
err = err_buf.getvalue()
_logger.error(f'Database restore failed: {err}')
raise Exception("Failed to restore")
def get_available_backups():
...
def delete_old_backups():
...
<project>/hooks/on_index_rollback.py
...
async def on_index_rollback(
ctx: HookContext,
index: Index,
from_level: int,
to_level: int,
) -> None:
await ctx.execute_sql('on_index_rollback')
database_config: Union[SqliteDatabaseConfig, PostgresDatabaseConfig] = ctx.config.database
# if not a postgres db, reindex.
if database_config.kind != "postgres":
await ctx.reindex(ReindexingReason.ROLLBACK)
available_levels = backups.get_available_backups()
# if no backups available, reindex
if not available_levels:
await ctx.reindex(ReindexingReason.ROLLBACK)
# find the right level. ie the on that's closest to to_level
chosen_level = 0
for level in available_levels:
if level <= to_level and level > chosen_level:
chosen_level = level
# try to restore or reindex
try:
backups.restore(chosen_level, database_config)
await ctx.restart()
except Exception:
await ctx.reindex(ReindexingReason.ROLLBACK)
<project>/hooks/run_backups.py
...
async def run_backups(
ctx: HookContext,
) -> None:
database_config: Union[SqliteDatabaseConfig, PostgresDatabaseConfig] = ctx.config.database
if database_config.kind != "postgres":
return
level = ctx.get_tzkt_datasource("tzkt_mainnet")._level.get(MessageType.head)
if level is None:
return
backups.backup(level, database_config)
backups.delete_old_backups()
<project>/hooks/simulate_reorg.py
...
async def simulate_reorg(
ctx: HookContext
) -> None:
level = ctx.get_tzkt_datasource("tzkt_mainnet")._level.get(MessageType.head)
if level:
await ctx.fire_hook(
"on_index_rollback",
wait=True
index=None, # type: ignore
from_level=level,
to_level=level - 2,
)
Cookbook
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
Processing offchain data
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
Reusing typename for different contracts
In some cases, you may want to make some manual changes in typeclasses and ensure they won't be lost on init. Let's say you want to reuse typename for multiple contracts providing the same interface (like FA1.2 and FA2 tokens) but having different storage structure. You can comment out differing fields which are not important for your index.
types/contract_typename/storage.py
# dipdup: ignore
...
class ContractStorage(BaseModel):
class Config:
extra = Extra.ignore
some_common_big_map: Dict[str, str]
# unique_big_map_a: Dict[str, str]
# unique_big_map_b: Dict[str, str]
Don't forget Extra.ignore
Pydantic hint, otherwise indexing will fail. Files starting with # dipdup: ignore
won't be overwritten on init.
Synchronizing multiple handlers/hooks
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
Multiprocessing
It's impossible to use apscheduler
pool executors with hooks because HookContext
is not pickle-serializable. So, they are forbidden now in advanced.scheduler
config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in <project>/cli.py
:
from contextlib import AsyncExitStack
import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper
@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
config: DipDupConfig = ctx.obj.config
url = config.database.connection_string
models = f'{config.package}.models'
async with AsyncExitStack() as stack:
await stack.enter_async_context(tortoise_wrapper(url, models))
...
if __name__ == '__main__':
cli(prog_name='dipdup', standalone_mode=False) # type: ignore
Then use python -m <project>.cli
instead of dipdup
as an entrypoint. Now you can call do-something-heavy
like any other dipdup
command. dipdup.cli:cli
group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run
as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply
and ctx.pool_map
methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.
Examples
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
Demo projects
Here are several minimal examples how to use various DipDup features for a real case scenario:
Built with DipDup
This page is a brief overview of projects which use DipDup as an indexing solution.
Want to see your project at this page? Create an issue on GitHub!
HicDEX
HicDEX is a Tezos indexer for hicetnunc.art marketplace. Indexed data is available with a public GraphQL endpoint.
Homebase
Homebase is a web application that enables users to create and manage/use DAOs on the Tezos blockchain. This application aims to help empower community members and developers to launch and participate in Tezos-based DAOs.
Tezos Profiles
Tezos Profiles enables you to associate your online identity with your Tezos account.
Juster
Juster is an on-chain smart contract platform allowing users to take part in an automated betting market by creating events, providing liquidity to them, and making bets.
tz1and
A Virtual World and NFT Marketplace.
Services (plugins)
Services are standalone companion indexers written in Go.
mempool
This is an optional section used by the mempool indexer plugin. It uses contracts
and datasources
aliases as well as the database
connection.
Mempool configuration has two sections: settings
and indexers
(required).
{% page-ref page="../advanced/mempool-plugin.md" %}
Settings
This section is optional so are all the setting keys.
mempool:
settings:
keep_operations_seconds: 172800
expired_after_blocks: 60
keep_in_chain_blocks: 10
mempool_request_interval_seconds: 10
rpc_timeout_seconds: 10
indexers:
...
keep_operations_seconds
How long to store operations that did not get into the chain. After that period, such operations will be wiped from the database. Default value is 172800 seconds (2 days).
expired_after_blocks
When level(head) - level(operation.branch) >= expired_after_blocks
and operation is still on in chain it's marked as expired. Default value is 60 blocks (~1 hour).
keep_in_chain_blocks
Since the main purpose of this plugin is to index mempool operations (actually it's a rolling index), all the operations that were included in the chain are removed from the database after specified period of time. Default value is 10 blocks (~10 minutes).
mempool_request_interval_seconds
How often Tezos nodes should be polled for pending mempool operations. Default value is 10 seconds.
rpc_timeout_seconds
Tezos node request timeout. Default value is 10 seconds.
Indexers
You can index several networks at once, or index different nodes independently. Indexer names are not standardized, but for clarity it's better to stick with some meaningful keys:
mempool:
settings:
...
indexers:
mainnet:
filters:
kinds:
- transaction
accounts:
- contract_alias
datasources:
tzkt: tzkt_mainnet
rpc:
- node_mainnet
edonet:
florencenet:
Each indexer object has two keys: filters
and datasources
(required).
Filters
An optional section specifying which mempool operations should be indexed. By default all transactions will be indexed.
kinds
Array of operations kinds, default value is transaction
(single item).
The complete list of values allowed:
activate_account
ballot
delegation*
double_baking_evidence
double_endorsement_evidence
endorsement
origination*
proposal
reveal*
seed_nonce_revelation
transaction*
*
— manager operations.
accounts
Array of contract aliases used to filter operations by source or destination.
NOTE: applied to manager operations only.
Datasources
Mempool plugin is tightly coupled with TzKT and Tezos node providers.
tzkt
An alias pointing to a datasource of kind tzkt
is expected.
rpc
An array of aliases pointing to datasources of kind tezos-node
Polling multiple nodes allows to detect more refused operations and makes indexing more robust in general.
metadata
This is an optional section used by the metadata indexer plugin. It uses contracts
and datasources
aliases as well as the database
connection.
Metadata configuration has two required sections: settings
and indexers
{% content-ref url="../advanced/metadata-plugin.md" %} metadata-plugin.md {% endcontent-ref %}
Settings
metadata:
settings:
ipfs_gateways:
- https://cloudflare-ipfs.com
ipfs_timeout: 10
http_timeout: 10
max_retry_count_on_error: 3
contract_service_workers: 15
token_service_workers: 75
indexers:
...
ipfs_gateways
An array of IPFS gateways. The indexer polls them sequentially until it gets a result or runs out of attempts. It is recommended to specify more than one gateway to overcome propagation issues, rate limits, and other problems.
ipfs_timeout
How long DipDup will wait for a single IPFS gateway response. Default value is 10 seconds.
http_timeout
How long DipDup will wait for a HTTP server response. Default value is 10 seconds.
max_retry_count_on_error
If DipDup fails to get a response from IPFS gateway or HTTP server, it will try again after some time, until it runs out of attempts. Default value is 3 attempts.
contract_service_workers
Count of contract service workers which resolves contract metadata. Default value is 5.
token_service_workers
Count of token service workers which resolves token metadata. Default value is 5.
Indexers
You can index several networks at once, or go with a single one. Indexer names are not standardized, but for clarity it's better to stick with some meaningful keys:
metadata:
settings:
...
indexers:
mainnet:
filters:
accounts:
- contract_alias
datasources:
tzkt: tzkt_mainnet
Each indexer object has two keys: filters
and datasources
(required).
Filters
accounts
Array of contract aliases used to filter Big_map updates by the owner contract address.
Datasources
Metadata plugin is tightly coupled with TzKT provider.
tzkt
An alias pointing to a datasource of kind tzkt
is expected.
dipdup¶
Manage and run DipDup indexers.
Full docs: https://dipdup.net/docs
Report an issue: https://github.com/dipdup-net/dipdup-py/issues
dipdup [OPTIONS] COMMAND [ARGS]...
Options
- --version¶
Show the version and exit.
- -c, --config <config>¶
A path to DipDup project config (default: dipdup.yml).
- -e, --env-file <env_file>¶
A path to .env file containing KEY=value strings.
- -l, --logging-config <logging_config>¶
A path to Python logging config in YAML format.
cache¶
Manage internal cache.
dipdup cache [OPTIONS] COMMAND [ARGS]...
clear¶
Clear request cache of DipDup datasources.
dipdup cache clear [OPTIONS]
show¶
Show information about DipDup disk caches.
dipdup cache show [OPTIONS]
config¶
Commands to manage DipDup configuration.
dipdup config [OPTIONS] COMMAND [ARGS]...
env¶
Dump environment variables used in DipDup config.
If variable is not set, default value will be used.
dipdup config env [OPTIONS]
Options
- -f, --file <file>¶
Output to file instead of stdout.
export¶
Print config after resolving all links and templates.
WARNING: Avoid sharing output with 3rd-parties when –unsafe flag set - it may contain secrets!
dipdup config export [OPTIONS]
Options
- --unsafe¶
Resolve environment variables or use default values from config.
hasura¶
Hasura integration related commands.
dipdup hasura [OPTIONS] COMMAND [ARGS]...
configure¶
Configure Hasura GraphQL Engine to use with DipDup.
dipdup hasura configure [OPTIONS]
Options
- --force¶
Proceed even if Hasura is already configured.
init¶
Generate project tree, missing callbacks and types.
This command is idempotent, meaning it won’t overwrite previously generated files unless asked explicitly.
dipdup init [OPTIONS]
Options
- --overwrite-types¶
Regenerate existing types.
- --keep-schemas¶
Do not remove JSONSchemas after generating types.
migrate¶
Migrate project to the new spec version.
If you’re getting MigrationRequiredError after updating DipDup, this command will fix imports and type annotations to match the current spec_version. Review and commit changes after running it.
dipdup migrate [OPTIONS]
run¶
Run indexer.
Execution can be gracefully interrupted with Ctrl+C or SIGTERM signal.
dipdup run [OPTIONS]
Options
- --postpone-jobs¶
Do not start job scheduler until all indexes are synchronized.
- --early-realtime¶
Establish a realtime connection before all indexes are synchronized.
- --merge-subscriptions¶
Subscribe to all operations/big map diffs during realtime indexing.
- --metadata-interface¶
Enable metadata interface.
schema¶
Manage database schema.
dipdup schema [OPTIONS] COMMAND [ARGS]...
approve¶
Continue to use existing schema after reindexing was triggered.
dipdup schema approve [OPTIONS]
export¶
Print SQL schema including scripts from sql/on_reindex.
This command may help you debug inconsistency between project models and expected SQL schema.
dipdup schema export [OPTIONS]
init¶
Prepare a database for running DipDip.
This command creates tables based on your models, then executes sql/on_reindex to finish preparation - the same things DipDup does when run on a clean database.
dipdup schema init [OPTIONS]
wipe¶
Drop all database tables, functions and views.
WARNING: This action is irreversible! All indexed data will be lost!
dipdup schema wipe [OPTIONS]
Options
- --immune¶
Drop immune tables too.
- --force¶
Skip confirmation prompt.
status¶
Show the current status of indexes in the database.
dipdup status [OPTIONS]
Config file reference
DipDup configuration is stored in YAML files of a specific format. By default, DipDup searches for dipdup.yml
file in the current working directory, but you can provide any path with a -c
CLI option:
dipdup -c configs/config.yml run
General structure
DipDup configuration file consists of several logical blocks:
Header | spec_version * |
package * | |
Inventory | database * |
contracts * | |
datasources * | |
Index definitions | indexes |
templates | |
Integrations | sentry |
hasura | |
Hooks | hooks |
jobs |
*
— required sections
Environment variables
DipDup supports compose-style variable expansion with optional default value:
field: ${ENV_VAR:-default_value}
You can use environment variables throughout the configuration file, except for property names (YAML object keys).
Merging config files
DipDup allows you to customize the configuration for a specific environment or a workflow. It works similar to docker-compose, but only for top-level sections. If you want to override a nested property, you need to recreate a whole top-level section. To merge several DipDup config files, provide -c
command-line option multiple times:
dipdup -c dipdup.yml -c dipdup.prod.yml run
Run config export
command if unsure about final config used by DipDup.
- class dipdup.config.AdvancedConfig(reindex: typing.Dict[dipdup.enums.ReindexingReason, dipdup.enums.ReindexingAction] = <factory>, scheduler: typing.Optional[typing.Dict[str, typing.Any]] = None, postpone_jobs: bool = False, early_realtime: bool = False, merge_subscriptions: bool = False, metadata_interface: bool = False, skip_version_check: bool = False)¶
Feature flags and other advanced config.
- Parameters
reindex – Mapping of reindexing reasons and actions DipDup performs
scheduler – apscheduler scheduler config
postpone_jobs – Do not start job scheduler until all indexes are in realtime state
early_realtime – Establish realtime connection immediately after startup
merge_subscriptions – Subscribe to all operations instead of exact channels
metadata_interface – Expose metadata interface for TzKT
skip_version_check – Do not check for new DipDup versions on startup
- class dipdup.config.BigMapHandlerConfig(callback: str, contract: Union[str, dipdup.config.ContractConfig], path: str)¶
Big map handler config
- Parameters
contract – Contract to fetch big map from
path – Path to big map (alphanumeric string with dots)
- initialize_big_map_type(package: str) None ¶
Resolve imports and initialize key and value type classes
- class dipdup.config.BigMapIndexConfig(kind: Literal['big_map'], datasource: Union[str, dipdup.config.TzktDatasourceConfig], handlers: Tuple[dipdup.config.BigMapHandlerConfig, ...], skip_history: dipdup.enums.SkipHistory = SkipHistory.never, first_level: int = 0, last_level: int = 0)¶
Big map index config
- Parameters
kind – always big_map
datasource – Index datasource to fetch big maps with
handlers – Description of big map diff handlers
skip_history – Fetch only current big map keys ignoring historical changes
first_level – Level to start indexing from
last_level – Level to stop indexing at (Dipdup will terminate at this level)
- class dipdup.config.CallbackMixin(callback: str)¶
Mixin for callback configs
- Parameters
callback – Callback name
- class dipdup.config.CodegenMixin¶
Base for pattern config classes containing methods required for codegen
- locate_arguments() Dict[str, Optional[Type]] ¶
Try to resolve scope annotations for arguments
- class dipdup.config.CoinbaseDatasourceConfig(kind: Literal['coinbase'], api_key: Optional[str] = None, secret_key: Optional[str] = None, passphrase: Optional[str] = None, http: Optional[dipdup.config.HTTPConfig] = None)¶
Coinbase datasource config
- Parameters
kind – always ‘coinbase’
api_key – API key
secret_key – API secret key
passphrase – API passphrase
http – HTTP client configuration
- class dipdup.config.ContractConfig(address: str, typename: Optional[str] = None)¶
Contract config
- Parameters
address – Contract address
typename – User-defined alias for the contract script
- class dipdup.config.DipDupConfig(spec_version: str, package: str, datasources: typing.Dict[str, typing.Union[dipdup.config.TzktDatasourceConfig, dipdup.config.CoinbaseDatasourceConfig, dipdup.config.MetadataDatasourceConfig, dipdup.config.IpfsDatasourceConfig, dipdup.config.HttpDatasourceConfig]], database: typing.Union[dipdup.config.SqliteDatabaseConfig, dipdup.config.PostgresDatabaseConfig] = SqliteDatabaseConfig(kind='sqlite', path=':memory:'), contracts: typing.Dict[str, dipdup.config.ContractConfig] = <factory>, indexes: typing.Dict[str, typing.Union[dipdup.config.OperationIndexConfig, dipdup.config.BigMapIndexConfig, dipdup.config.HeadIndexConfig, dipdup.config.TokenTransferIndexConfig, dipdup.config.IndexTemplateConfig]] = <factory>, templates: typing.Dict[str, typing.Union[dipdup.config.OperationIndexConfig, dipdup.config.BigMapIndexConfig, dipdup.config.HeadIndexConfig, dipdup.config.TokenTransferIndexConfig]] = <factory>, jobs: typing.Dict[str, dipdup.config.JobConfig] = <factory>, hooks: typing.Dict[str, dipdup.config.HookConfig] = <factory>, hasura: typing.Optional[dipdup.config.HasuraConfig] = None, sentry: typing.Optional[dipdup.config.SentryConfig] = None, prometheus: typing.Optional[dipdup.config.PrometheusConfig] = None, advanced: dipdup.config.AdvancedConfig = AdvancedConfig(reindex={}, scheduler=None, postpone_jobs=False, early_realtime=False, merge_subscriptions=False, metadata_interface=False, skip_version_check=False), custom: typing.Dict[str, typing.Any] = <factory>)¶
Main indexer config
- Parameters
spec_version – Version of specification
package – Name of indexer’s Python package, existing or not
datasources – Mapping of datasource aliases and datasource configs
database – Database config
contracts – Mapping of contract aliases and contract configs
indexes – Mapping of index aliases and index configs
templates – Mapping of template aliases and index templates
jobs – Mapping of job aliases and job configs
hooks – Mapping of hook aliases and hook configs
hasura – Hasura integration config
sentry – Sentry integration config
prometheus – Prometheus integration config
advanced – Advanced config
custom – User-defined Custom config
- property oneshot: bool¶
Whether all indexes have last_level field set
- property package_path: str¶
Absolute path to the indexer package, existing or default
- property per_index_rollback: bool¶
Check if package has on_index_rollback hook
- class dipdup.config.HTTPConfig(cache: Optional[bool] = None, retry_count: Optional[int] = None, retry_sleep: Optional[float] = None, retry_multiplier: Optional[float] = None, ratelimit_rate: Optional[int] = None, ratelimit_period: Optional[int] = None, connection_limit: Optional[int] = None, connection_timeout: Optional[int] = None, batch_size: Optional[int] = None)¶
Advanced configuration of HTTP client
- Parameters
cache – Whether to cache responses
retry_count – Number of retries after request failed before giving up
retry_sleep – Sleep time between retries
retry_multiplier – Multiplier for sleep time between retries
ratelimit_rate – Number of requests per period (“drops” in leaky bucket)
ratelimit_period – Time period for rate limiting in seconds
connection_limit – Number of simultaneous connections
connection_timeout – Connection timeout in seconds
batch_size – Number of items fetched in a single paginated request (for some APIs)
- merge(other: Optional[dipdup.config.HTTPConfig]) dipdup.config.HTTPConfig ¶
Set missing values from other config
- class dipdup.config.HandlerConfig(callback: str)¶
- class dipdup.config.HasuraConfig(url: str, admin_secret: Optional[str] = None, source: str = 'default', select_limit: int = 100, allow_aggregations: bool = True, camel_case: bool = False, rest: bool = True, http: Optional[dipdup.config.HTTPConfig] = None)¶
Config for the Hasura integration.
- Parameters
url – URL of the Hasura instance.
admin_secret – Admin secret of the Hasura instance.
source – Hasura source for DipDup to configure, others will be left untouched.
select_limit – Row limit for unauthenticated queries.
allow_aggregations – Whether to allow aggregations in unauthenticated queries.
camel_case – Whether to use camelCase instead of default pascal_case for the field names (incompatible with metadata_interface flag)
rest – Enable REST API both for autogenerated and custom queries.
http – HTTP connection tunables
- property headers: Dict[str, str]¶
Headers to include with every request
- class dipdup.config.HeadHandlerConfig(callback: str)¶
Head block handler config
- class dipdup.config.HeadIndexConfig(kind: Literal['head'], datasource: Union[str, dipdup.config.TzktDatasourceConfig], handlers: Tuple[dipdup.config.HeadHandlerConfig, ...])¶
Head block index config
- class dipdup.config.HookConfig(callback: str, args: typing.Dict[str, str] = <factory>, atomic: bool = False)¶
Hook config
- Parameters
args – Mapping of argument names and annotations (checked lazily when possible)
atomic – Wrap hook in a single database transaction
- class dipdup.config.HttpDatasourceConfig(kind: Literal['http'], url: str, http: Optional[dipdup.config.HTTPConfig] = None)¶
Generic HTTP datasource config
kind: always ‘http’ url: URL to fetch data from http: HTTP client configuration
- class dipdup.config.IndexConfig(kind: str, datasource: Union[str, dipdup.config.TzktDatasourceConfig])¶
Index config
- Parameters
datasource – Alias of index datasource in datasources section
- hash() str ¶
Calculate hash to ensure config has not changed since last run.
- class dipdup.config.IndexTemplateConfig(template: str, values: Dict[str, str], first_level: int = 0, last_level: int = 0)¶
Index template config
- Parameters
kind – always template
name – Name of index template
template_values – Values to be substituted in template (<key> -> value)
first_level – Level to start indexing from
last_level – Level to stop indexing at (DipDup will terminate at this level)
- class dipdup.config.IpfsDatasourceConfig(kind: Literal['ipfs'], url: str = 'https://ipfs.io/ipfs', http: Optional[dipdup.config.HTTPConfig] = None)¶
IPFS datasource config
- Parameters
kind – always ‘ipfs’
url – IPFS node URL, e.g. https://ipfs.io/ipfs/
http – HTTP client configuration
- class dipdup.config.JobConfig(hook: typing.Union[str, dipdup.config.HookConfig], crontab: typing.Optional[str] = None, interval: typing.Optional[int] = None, daemon: bool = False, args: typing.Dict[str, typing.Any] = <factory>)¶
Job schedule config
- Parameters
hook – Name of hook to run
crontab – Schedule with crontab syntax (* * * * *)
interval – Schedule with interval in seconds
daemon – Run hook as a daemon (never stops)
args – Arguments to pass to the hook
- class dipdup.config.LoggingConfig(config: Dict[str, Any])¶
- class dipdup.config.MetadataDatasourceConfig(kind: Literal['metadata'], network: dipdup.datasources.metadata.enums.MetadataNetwork, url: str = 'https://metadata.dipdup.net', http: Optional[dipdup.config.HTTPConfig] = None)¶
DipDup Metadata datasource config
- Parameters
kind – always ‘metadata’
network – Network name, e.g. mainnet, hangzhounet, etc.
url – GraphQL API URL, e.g. https://metadata.dipdup.net
http – HTTP client configuration
- class dipdup.config.NameMixin¶
- class dipdup.config.OperationHandlerConfig(callback: str, pattern: Tuple[Union[dipdup.config.OperationHandlerOriginationPatternConfig, dipdup.config.OperationHandlerTransactionPatternConfig], ...])¶
Operation handler config
- Parameters
callback – Name of method in handlers package
pattern – Filters to match operation groups
- class dipdup.config.OperationHandlerOriginationPatternConfig(type: Literal['origination'] = 'origination', source: Optional[Union[str, dipdup.config.ContractConfig]] = None, similar_to: Optional[Union[str, dipdup.config.ContractConfig]] = None, originated_contract: Optional[Union[str, dipdup.config.ContractConfig]] = None, optional: bool = False, strict: bool = False)¶
Origination handler pattern config
- Parameters
type – always ‘origination’
source – Match operations by source contract alias
similar_to – Match operations which have the same code/signature (depending on strict field)
originated_contract – Match origination of exact contract
optional – Whether can operation be missing in operation group
strict – Match operations by storage only or by the whole code
- class dipdup.config.OperationHandlerTransactionPatternConfig(type: Literal['transaction'] = 'transaction', source: Optional[Union[str, dipdup.config.ContractConfig]] = None, destination: Optional[Union[str, dipdup.config.ContractConfig]] = None, entrypoint: Optional[str] = None, optional: bool = False)¶
Operation handler pattern config
- Parameters
type – always ‘transaction’
source – Match operations by source contract alias
destination – Match operations by destination contract alias
entrypoint – Match operations by contract entrypoint
optional – Whether can operation be missing in operation group
- class dipdup.config.OperationIndexConfig(kind: typing.Literal['operation'], datasource: typing.Union[str, dipdup.config.TzktDatasourceConfig], handlers: typing.Tuple[dipdup.config.OperationHandlerConfig, ...], types: typing.Tuple[dipdup.enums.OperationType, ...] = (<OperationType.transaction: 'transaction'>,), contracts: typing.List[typing.Union[str, dipdup.config.ContractConfig]] = <factory>, first_level: int = 0, last_level: int = 0)¶
Operation index config
- Parameters
kind – always operation
handlers – List of indexer handlers
types – Types of transaction to fetch
contracts – Aliases of contracts being indexed in contracts section
first_level – Level to start indexing from
last_level – Level to stop indexing at (DipDup will terminate at this level)
- property address_filter: Set[str]¶
Set of addresses (any field) to filter operations with before an actual matching
- property entrypoint_filter: Set[Optional[str]]¶
Set of entrypoints to filter operations with before an actual matching
- class dipdup.config.ParameterTypeMixin¶
parameter_type_cls field
- class dipdup.config.ParentMixin¶
parent field for index and template configs
- class dipdup.config.PatternConfig¶
- class dipdup.config.PostgresDatabaseConfig(kind: typing.Literal['postgres'], host: str, user: str = 'postgres', database: str = 'postgres', port: int = 5432, schema_name: str = 'public', password: str = '', immune_tables: typing.Tuple[str, ...] = <factory>, connection_timeout: int = 60)¶
Postgres database connection config
- Parameters
kind – always ‘postgres’
host – Host
port – Port
user – User
password – Password
database – Database name
schema_name – Schema name
immune_tables – List of tables to preserve during reindexing
connection_timeout – Connection timeout
- class dipdup.config.PrometheusConfig(host: str, port: int = 8000, update_interval: float = 1.0)¶
Config for Prometheus integration.
- Parameters
host – Host to bind to
port – Port to bind to
update_interval – Interval to update some metrics in seconds
- class dipdup.config.SentryConfig(dsn: str, environment: Optional[str] = None, debug: bool = False)¶
Config for Sentry integration.
- Parameters
dsn – DSN of the Sentry instance
environment – Environment to report to Sentry (informational only)
debug – Catch warning messages and more context
- class dipdup.config.SqliteDatabaseConfig(kind: Literal['sqlite'], path: str = ':memory:')¶
SQLite connection config
- Parameters
kind – always ‘sqlite’
path – Path to .sqlite3 file, leave default for in-memory database (:memory:)
- class dipdup.config.StorageTypeMixin¶
storage_type_cls field
- class dipdup.config.SubscriptionsMixin¶
subscriptions field
- class dipdup.config.TemplateValuesMixin¶
template_values field
- class dipdup.config.TokenTransferHandlerConfig(callback: str)¶
- class dipdup.config.TokenTransferIndexConfig(kind: typing.Literal['token_transfer'], datasource: typing.Union[str, dipdup.config.TzktDatasourceConfig], handlers: typing.Tuple[dipdup.config.TokenTransferHandlerConfig, ...] = <factory>, first_level: int = 0, last_level: int = 0)¶
Token index config
- class dipdup.config.TransactionIdxMixin¶
transaction_idx field to track index of operation in group
- Parameters
transaction_idx –
- class dipdup.config.TzktDatasourceConfig(kind: Literal['tzkt'], url: str, http: Optional[dipdup.config.HTTPConfig] = None, buffer_size: int = 0)¶
TzKT datasource config
- Parameters
kind – always ‘tzkt’
url – Base API URL, e.g. https://api.tzkt.io/
http – HTTP client configuration
buffer_size – Number of levels to keep in FIFO buffer before processing
advanced
advanced:
early_realtime: False
merge_subscriptions: False
postpone_jobs: False
reindex:
manual: wipe
migration: exception
rollback: ignore
config_modified: exception
schema_modified: exception
This config section allows users to tune some system-wide options, either experimental or unsuitable for generic configurations.
field | description |
---|---|
reindex | Mapping of reindexing reasons and actions DipDup performs |
scheduler | apscheduler scheduler config |
postpone_jobs | Do not start job scheduler until all indexes are in realtime state |
early_realtime | Establish realtime connection immediately after startup |
merge_subscriptions | Subscribe to all operations instead of exact channels |
metadata_interface | Expose metadata interface for TzKT |
CLI flags have priority over self-titled AdvancedConfig
fields.
🤓 SEE ALSO
contracts
A list of the contract definitions you might use in the indexer patterns or templates. Each contract entry has two fields:
address
— either originated or implicit account address encoded in base58.typename
— an alias for the particular contract script, meaning that two contracts sharing the same code can have the same type name.
contracts:
kusd_dex_mainnet:
address: KT1CiSKXR68qYSxnbzjwvfeMCRburaSDonT2
typename: quipu_fa12
tzbtc_dex_mainnet:
address: KT1N1wwNPqT5jGhM91GQ2ae5uY8UzFaXHMJS
typename: quipu_fa12
kusd_token_mainnet:
address: KT1K9gCRgaLRFKTErYt1wVxA3Frb9FjasjTV
typename: kusd_token
tzbtc_token_mainnet:
address: KT1PWx2mnDueood7fEmfbBDKx1D9BAnnXitn
typename: tzbtc_token
A typename
field is only required when using index templates, but it helps to improve the readability of auto-generated code and avoid repetition.
Contract entry does not contain information about the network, so it's a good idea to include the network name in the alias. This design choice makes possible a generic index parameterization via templates. See 4.5. Templates and variables for details.
If multiple contracts you index have the same interface but different code, see 8.2. Reusing typename for different contracts.
database
DipDup supports several database engines for development and production. The obligatory field kind
specifies which engine has to be used:
sqlite
postgres
(and compatible engines)
Database engines article may help you choose a database that better suits your needs.
SQLite
path
field must be either path to the .sqlite3 file or :memory:
to keep a database in memory only (default):
database:
kind: sqlite
path: db.sqlite3
field | description |
---|---|
kind | always 'sqlite' |
path | Path to .sqlite3 file, leave default for in-memory database |
PostgreSQL
Requires host
, port
, user
, password
, and database
fields. You can set schema_name
to values other than public
, but Hasura integration won't be available.
database:
kind: postgres
host: db
port: 5432
user: dipdup
password: ${POSTGRES_PASSWORD:-changeme}
database: dipdup
schema_name: public
field | description |
---|---|
kind | always 'postgres' |
host | Host |
port | Port |
user | User |
password | Password |
database | Database name |
schema_name | Schema name |
immune_tables | List of tables to preserve during reindexing |
connection_timeout | Connection timeout in seconds |
You can also use compose-style environment variable substitutions with default values for secrets and other fields. See Templates and variables for details.
Immune tables
In some cases, DipDup can't continue indexing with an existing database. See 5.3. Reindexing for details. One of the solutions to resolve reindexing state is to drop the database and start indexing from scratch. To achieve this, either invoke schema wipe
command or set an action to wipe
in advanced.reindex
config section.
You might want to keep several tables during schema wipe if data in them is not dependent on index states yet heavy. A typical example is indexing IPFS data — rollbacks do not affect off-chain storage, so you can safely continue after receiving a reorg message.
database:
immune_tables:
- token_metadata
- contract_metadata
immune_tables
is an optional array of table names that will be ignored during schema wipe. Note that to change the schema of an immune table, you need to perform a migration by yourself. DipDup will neither drop the table nor automatically handle the update.
datasources
A list of API endpoints DipDup uses to retrieve indexing data to process.
A datasource config entry is an alias for the endpoint URI; there's no network mention. Thus it's good to add a network name to the datasource alias, e.g. tzkt_mainnet
.
tzkt
datasources:
tzkt:
kind: tzkt
url: ${TZKT_URL:-https://api.tzkt.io}
http:
cache: false
retry_count: # retry infinetely
retry_sleep:
retry_multiplier:
ratelimit_rate:
ratelimit_period:
connection_limit: 100
connection_timeout: 60
batch_size: 10000
buffer_size: 0
coinbase
datasources:
coinbase:
kind: coinbase
dipdup-metadata
datasources:
metadata:
kind: metadata
url: https://metadata.dipdup.net
network: mainnet|handzhounet
ipfs
datasources:
ipfs:
kind: ipfs
url: https://ipfs.io/ipfs
🤓 SEE ALSO
hasura
This optional section used by DipDup executor to automatically configure Hasura engine to track your tables.
hasura:
url: http://hasura:8080
admin_secret: ${HASURA_ADMIN_SECRET:-changeme}
allow_aggregations: false
camel_case: true
rest: true
select_limit: 100
source: default
🤓 SEE ALSO
hooks
Hooks are user-defined callbacks you can execute with a job scheduler or within another callback (with ctx.fire_hook
).
hooks:
calculate_stats:
callback: calculate_stats
atomic: False
args:
major: bool
depth: int
🤓 SEE ALSO
indexes
Index — is a basic DipDup entity connecting the inventory and specifying data handling rules.
Each index has a unique string identifier acting as a key under indexes
config section:
indexes:
my_index:
kind: operation
datasource: tzkt_mainnet
There can be various index kinds; currently, two possible options are supported for the kind
field:
operation
big_map
All the indexes have to specify the datasource
field, an alias of an existing entry under the datasources section.
Indexing scope
One can optionally specify block levels DipDup has to start and stop indexing at, e.g., there's a new version of the contract, and it will be more efficient to stop handling the old one.
indexes:
my_index:
first_level: 1000000
last_level: 2000000
big_map
big_map index allows querying only updates of a specific big map (or several). In some cases, it can drastically reduce the amount of data transferred and speed up the indexing process.
indexes:
my_index:
kind: big_map
datasource: tzkt
skip_history: never
handlers:
- callback: on_leger_update
contract: contract1
path: data.ledger
- callback: on_token_metadata_update
contract: contract1
path: token_metadata
Handlers
Each big_map handler contains three required fields:
callback
— name of the async function with a particular signature; DipDup will try to load it from the module with the same name<package_name>.handlers.<callback>
contract
— Big map parent contract (from the inventory)path
— path to the Big map in the contract storage (use dot as a delimiter)
Index only the current state of big maps
When skip_history
field is set to once
, DipDup will skip historical changes only on initial sync and switch to regular indexing afterward. When the value is always
, DipDup will fetch all big map keys on every restart. Preferrable mode depends on your workload.
All big map diffs DipDup pass to handlers during fast sync have action
field set to BigMapAction.ADD_KEY
. Keep in mind that DipDup fetches all keys in this mode, including ones removed from the big map. You can filter out latter by BigMapDiff.data.active
field if needed.
head
🚧 UNDER CONSTRUCTION
This page or paragraph is yet to be written. Come back later.
operation
Operation index allows you to query only those operations related to your DApp and do pattern matching on its content (internal calls chain). It is the closest thing to fully-fledged event logs.
Filters
DipDup supports filtering operations by kind
, source
, destination
(if applicable), and originated_contract
(if applicable).
DipDup fetches only applied operations.
contracts
indexes:
my_index:
kind: operation
datasource: tzkt
contracts:
- contract1
- contract2
In this example, DipDup will fetch all the operations where any of source and destination is equal to either contract1 or contract2 address. contracts
field is obligatory, there has to be at least one contract alias (from the inventory).
types
By default, DipDup works only with transactions, but you can explicitly list operation types you want to subscribe to (currently transaction
and origination
types are supported):
indexes:
my_index:
kind: operation
datasource: tzkt
contracts:
- contract1
types:
- transaction
- origination
Note that in the case of originations, DipDup will query operations where either source or originated contract address is equal to contract1.
Handlers
Each operation handler contains two required fields:
callback
— name of the async function with a particular signature; DipDup will try to load it from the module with the same name<package_name>.handlers.<callback>
pattern
— a non-empty list of items that have to be matched
indexes:
my_index:
kind: operation
datasource: tzkt
contracts:
- contract1
handlers:
- callback: on_call
pattern:
- destination: contract1
entrypoint: call
You can think of operation pattern as a regular expression on a sequence of operations (both external and internal) with global flag enabled (can be multiple matches) and where various operation parameters (type, source, destination, entrypoint, originated contract) are used for matching.
Pattern
Here are the supported filters for matching operations (all optional):
type
— (either transaction or origination) usually inferred from the existence of other fieldsdestination
— invoked contract alias (from the inventory)entrypoint
— invoked entrypoint namesource
— operation sender alias (from the inventory)originated_contract
— originated contract alias (from the inventory)similar_to
— originated contract has the same parameter and storage types as the reference one (from the inventory)strict
— stronger thesimilar_to
filter by comparing the entire code rather than just parameter+storageoptional
— continue matching even if this item is not found (with limitations, see below)
It's unnecessary to match the entire operation content; you can skip external/internal calls that are not relevant. However, there is a limitation: optional items cannot be followed by operations ignored by the pattern.
pattern:
- destination: contract_1
entrypoint: call_1
- destination: contract_2
entrypoint: internal_call_2
- source: contract_1
type: transaction
- source: contract_2
type: origination
similar_to: contract_3
strict: true
optional: true
You will get slightly different callback argument types depending on whether you specify destination+entrypoint for transactions and originated_contract for originations. Namely, in the first case, DipDup will generate the dataclass for a particular entrypoint/storage, and in the second case not (meaning you will have to handle untyped parameters/storage updates).
template
This index type is used for creating a static template instance.
indexes:
my_index:
template: my_template
values:
placeholder1: value1
placeholder2: value2
For a static template instance (specified in the DipDup config) there are two fields:
template
— template name (from templates section)values
— concrete values for each placeholder used in a chosen template
jobs
Add the following section to DipDup config:
jobs:
midnight_stats:
hook: calculate_stats
crontab: "0 0 * * *"
args:
major: True
leet_stats:
hook: calculate_stats
interval: 1337 # in seconds
args:
major: False
If you're not familiar with the crontab syntax, there's an online service crontab.guru that will help you build the desired expression.
package
DipDup uses this field to discover the Python package of your project.
package: my_indexer_name
DipDup will search for a module named my_module_name
in PYTHONPATH
This field allows to decouple DipDup configuration file from the indexer implementation and gives more flexibility in managing the source code.
See 4.4. Project structure for details.
prometheus
prometheus:
host: 0.0.0.0
Prometheus integration options
field | description |
---|---|
host | Host to bind to |
port | Port to bind to |
update_interval | Interval to update some metrics in seconds |
sentry
sentry:
dsn: https://...
environment: dev
debug: False
field | description |
---|---|
dsn | DSN of the Sentry instance |
environment | Environment to report to Sentry (informational only) |
debug | Catch warning messages and more context |
🤓 SEE ALSO
spec_version
DipDup specification version is used to determine the compatibility of the toolkit and configuration file format and/or features.
spec_version: 1.2
This table shows which specific SDK releases support which DipDup file versions.
spec_version value | Supported DipDup versions |
---|---|
0.1 | >=0.0.1, <= 0.4.3 |
1.0 | >=1.0.0, <=1.1.2 |
1.1 | >=2.0.0, <=2.0.9 |
1.2 | >=3.0.0 |
If you're getting MigrationRequiredError
after updating the framework, run dipdup migrate
command to perform project migration.
templates
indexes:
foo:
kind: template
name: bar
first_level: 12341234
template_values:
network: mainnet
templates:
bar:
kind" index
datasource: tzkt_<network> # resolves into `tzkt_mainnet`
...
field | description |
---|---|
kind | always template |
name | Name of index template |
template_values | Values to be substituted in template (<key> → value ) |
first_level | Level to start indexing from |
last_level | Level to stop indexing at (DipDup will terminate at this level) |
Changelog
5.1.7 - 2022-06-15
Fixed
- index: Fixed
token_transfer
index not receiving realtime updates.
5.1.6 - 2022-06-08
Fixed
- cli: Commands with
--help
option no longer require a working DipDup config. - index: Fixed crash with
RuntimeError
after continuous realtime connection loss.
Performance
- cli: Lazy import dependencies to speed up startup.
Other
- docs: Migrate docs from GitBook to mdbook.
5.1.5 - 2022-06-05
Fixed
- config: Fixed crash when rollback hook is about to be called.
5.1.4 - 2022-06-02
Fixed
- config: Fixed
OperationIndexConfig.types
field being partially ignored. - index: Allow mixing oneshot and regular indexes in a single config.
- index: Call rollback hook instead of triggering reindex when single-level rollback has failed.
- index: Fixed crash with
RuntimeError
after continuous realtime connection loss. - tzkt: Fixed
origination
subscription missing whenmerge_subscriptions
flag is set.
Performance
- ci: Decrease the size of generic and
-pytezos
Docker images by 11% and 16%, respectively.
5.1.3 - 2022-05-26
Fixed
- database: Fixed special characters in password not being URL encoded.
Performance
- context: Do not reinitialize config when adding a single index.
5.1.2 - 2022-05-24
Added
- tzkt: Added
originated_contract_tzips
field toOperationData
.
Fixed
- jobs: Fixed jobs with
daemon
schedule never start. - jobs: Fixed failed jobs not throwing exceptions into the main loop.
Other
- database: Tortoise ORM updated to
0.19.1
.
5.1.1 - 2022-05-13
Fixed
- index: Ignore indexes with different message types on rollback.
- metadata: Add
ithacanet
to available networks.
5.1.0 - 2022-05-12
Added
- ci: Push
X
andX.Y
tags to the Docker Hub on release. - cli: Added
config env
command to export env-file with default values. - cli: Show warning when running an outdated version of DipDup.
- hooks: Added a new hook
on_index_rollback
to perform per-index rollbacks.
Fixed
- index: Fixed fetching
migration
operations. - tzkt: Fixed possible data corruption when using the
buffer_size
option. - tzkt: Fixed reconnection due to
websockets
message size limit.
Deprecated
- hooks: The
on_rollback
default hook is superseded byon_index_rollback
and will be removed later.
5.0.4 - 2022-05-05
Fixed
- exceptions: Fixed incorrect formatting and broken links in help messages.
- index: Fixed crash when the only index in config is
head
. - index: Fixed fetching originations during the initial sync.
5.0.3 - 2022-05-04
Fixed
- index: Fixed crash when no block with the same level arrived after a single-level rollback.
- index: Fixed setting initial index level when
IndexConfig.first_level
is set. - tzkt: Fixed delayed emitting of buffered realtime messages.
- tzkt: Fixed inconsistent behavior of
first_level
/last_level
arguments in different getter methods.
5.0.2 - 2022-04-21
Fixed
- context: Fixed reporting incorrect reindexing reason.
- exceptions: Fixed crash with
FrozenInstanceError
when an exception is raised from a callback. - jobs: Fixed graceful shutdown of daemon jobs.
Improved
- codegen: Refined
on_rollback
hook template. - exceptions: Updated help messages for known exceptions.
- tzkt: Do not request reindexing if missing subgroups have matched no handlers.
5.0.1 - 2022-04-12
Fixed
- cli: Fixed
schema init
command crash with SQLite databases. - index: Fixed spawning datasources in oneshot mode.
- tzkt: Fixed processing realtime messages.
5.0.0 - 2022-04-08
This release contains no changes except for the version number.
5.0.0-rc4 - 2022-04-04
Added
- tzkt: Added ability to process realtime messages with lag.
4.2.7 - 2022-04-02
Fixed
- config: Fixed
jobs
config section validation. - hasura: Fixed metadata generation for v2.3.0 and above.
- tzkt: Fixed
get_originated_contracts
andget_similar_contracts
methods response.
5.0.0-rc3 - 2022-03-28
Added
- config: Added
custom
section to store arbitrary user data.
Fixed
- config: Fixed default SQLite path (
:memory:
). - tzkt: Fixed pagination in several getter methods.
- tzkt: Fixed data loss when
skip_history
option is enabled.
Removed
- config: Removed dummy
advanced.oneshot
flag. - cli: Removed
docker init
command. - cli: Removed dummy
schema approve --hashes
flag.
5.0.0-rc2 - 2022-03-13
Fixed
- tzkt: Fixed crash in methods that do not support cursor pagination.
- prometheus: Fixed invalid metric labels.
5.0.0-rc1 - 2022-03-02
Added
- metadata: Added
metadata_interface
feature flag to expose metadata in TzKT format. - prometheus: Added ability to expose Prometheus metrics.
- tzkt: Added missing fields to the
HeadBlockData
model. - tzkt: Added
iter_...
methods to iterate over item batches.
Fixed
- tzkt: Fixed possible OOM while calling methods that support pagination.
- tzkt: Fixed possible data loss in
get_originations
andget_quotes
methods.
Changed
- tzkt: Added
offset
andlimit
arguments to all methods that support pagination.
Removed
- bcd: Removed
bcd
datasource and config section.
Performance
- dipdup: Use fast
orjson
library instead of built-injson
where possible.
4.2.6 - 2022-02-25
Fixed
- database: Fixed generating table names from uppercase model names.
- http: Fixed bug that leads to caching invalid responses on the disk.
- tzkt: Fixed processing realtime messages with data from multiple levels.
4.2.5 - 2022-02-21
Fixed
- database: Do not add the
schema
argument to the PostgreSQL connection string when not needed. - hasura: Wait for Hasura to be configured before starting indexing.
4.2.4 - 2022-02-14
Added
- config: Added
http
datasource to making arbitrary http requests.
Fixed
- context: Fixed crash when calling
fire_hook
method. - context: Fixed
HookConfig.atomic
flag, which was ignored infire_hook
method. - database: Create missing tables even if
Schema
model is present. - database: Fixed excess increasing of
decimal
context precision. - index: Fixed loading handler callbacks from nested packages (@veqtor).
Other
- ci: Added GitHub Action to build and publish Docker images for each PR opened.
4.2.3 - 2022-02-08
Fixed
- ci: Removed
black 21.12b0
dependency since bug indatamodel-codegen-generator
is fixed. - cli: Fixed
config export
command crash whenadvanced.reindex
dictionary is present. - cli: Removed optionals from
config export
output so the result can be loaded again. - config: Verify
advanced.scheduler
config for the correctness and unsupported features. - context: Fixed ignored
wait
argument offire_hook
method. - hasura: Fixed processing relation fields with missing
related_name
. - jobs: Fixed default
apscheduler
config. - tzkt: Fixed crash occurring when reorg message is the first one received by the datasource.
4.2.2 - 2022-02-01
Fixed
- config: Fixed
ipfs
datasource config.
4.2.1 - 2022-01-31
Fixed
- ci: Added
black 21.12b0
dependency to avoid possible conflict withdatamodel-codegen-generator
.
4.2.0 - 2022-01-31
Added
- context: Added
wait
argument tofire_hook
method to escape current transaction context. - context: Added
ctx.get_<kind>_datasource
helpers to avoid type casting. - hooks: Added ability to configure
apscheduler
withAdvancedConfig.scheduler
field. - http: Added
request
method to send arbitrary requests (affects all datasources). - ipfs: Added
ipfs
datasource to download JSON and binary data from IPFS.
Fixed
- http: Removed dangerous method
close_session
. - context: Fixed help message of
IndexAlreadyExistsError
exception.
Deprecated
- bcd: Added deprecation notice.
Other
- dipdup: Removed unused internal methods.
4.1.2 - 2022-01-27
Added
- cli: Added
schema wipe --force
argument to skip confirmation prompt.
Fixed
- cli: Show warning about deprecated
--hashes
argument - cli: Ignore
SIGINT
signal when shutdown is in progress. - sentry: Ignore exceptions when shutdown is in progress.
4.1.1 - 2022-01-25
Fixed
- cli: Fixed stacktraces missing on exception.
- cli: Fixed wrapping
OSError
withConfigurationError
during config loading. - hasura: Fixed printing help messages on
HasuraError
. - hasura: Preserve a list of sources in Hasura Cloud environments.
- hasura: Fixed
HasuraConfig.source
config option.
Changed
- cli: Unknown exceptions are no longer wrapped with
DipDupError
.
Performance
- hasura: Removed some useless requests.
4.1.0 - 2022-01-24
Added
- cli: Added
schema init
command to initialize database schema. - cli: Added
--force
flag tohasura configure
command. - codegen: Added support for subpackages inside callback directories.
- hasura: Added
dipdup_head_status
view and REST endpoint. - index: Added an ability to skip historical data while synchronizing
big_map
indexes. - metadata: Added
metadata
datasource. - tzkt: Added
get_big_map
andget_contract_big_maps
datasource methods.
4.0.5 - 2022-01-20
Fixed
- index: Fixed deserializing manually modified typeclasses.
4.0.4 - 2022-01-17
Added
- cli: Added
--keep-schemas
flag toinit
command to preserve JSONSchemas along with generated types.
Fixed
- demos: Tezos Domains and Homebase DAO demos were updated from edo2net to mainnet contracts.
- hasura: Fixed missing relations for models with
ManyToManyField
fields. - tzkt: Fixed parsing storage with nested structures.
Performance
- dipdup: Minor overall performance improvements.
Other
- ci: Cache virtual environment in GitHub Actions.
- ci: Detect CI environment and skip tests that fail in GitHub Actions.
- ci: Execute tests in parallel with
pytest-xdist
when possible. - ci: More strict linting rules of
flake8
.
4.0.3 - 2022-01-09
Fixed
- tzkt: Fixed parsing parameter with an optional value.
4.0.2 - 2022-01-06
Added
- tzkt: Added optional
delegate_address
anddelegate_alias
fields toOperationData
.
Fixed
- tzkt: Fixed crash due to unprocessed pysignalr exception.
- tzkt: Fixed parsing
OperationData.amount
field. - tzkt: Fixed parsing storage with top-level boolean fields.
4.0.1 - 2021-12-30
Fixed
- codegen: Fixed generating storage typeclasses with
Union
fields. - codegen: Fixed preprocessing contract JSONSchema.
- index: Fixed processing reindexing reason saved in the database.
- tzkt: Fixed processing operations with default entrypoint and empty parameter.
- tzkt: Fixed crash while recursively applying bigmap diffs to the storage.
Performance
- tzkt: Increased speed of applying bigmap diffs to operation storage.
4.0.0 - 2021-12-24
This release contains no changes except for the version number.
4.0.0-rc4 - 2021-12-20
Fixed
- cli: Fixed missing
schema approve --hashes
argument. - codegen: Fixed contract address used instead of an alias when typename is not set.
- tzkt: Fixed processing operations with entrypoint
default
. - tzkt: Fixed regression in processing migration originations.
- tzkt: Fixed filtering of big map diffs by the path.
Removed
- cli: Removed deprecated
run --oneshot
argument andclear-cache
command.
4.0.0-rc2 - 2021-12-11
⚠ Migration
- Run
dipdup init
command to generateon_synchronized
hook stubs.
Added
- hooks: Added
on_synchronized
hook, which fires each time all indexes reach realtime state.
Fixed
- cli: Fixed config not being verified when invoking some commands.
- codegen: Fixed generating callback arguments for untyped operations.
- index: Fixed incorrect log messages, remove duplicate ones.
- index: Fixed crash while processing storage of some contracts.
- index: Fixed matching of untyped operations filtered by
source
field (@pravin-d).
Performance
- index: Checks performed on each iteration of the main DipDup loop are slightly faster now.
4.0.0-rc1 - 2021-12-02
⚠ Migration
- Run
dipdup schema approve
command on every database you want to use with 4.0.0-rc1. Runningdipdup migrate
is not necessary sincespec_version
hasn't changed in this release.
Added
- cli: Added
run --early-realtime
flag to establish a realtime connection before all indexes are synchronized. - cli: Added
run --merge-subscriptions
flag to subscribe to all operations/big map diffs during realtime indexing. - cli: Added
status
command to print the current status of indexes from the database. - cli: Added
config export [--unsafe]
command to print config after resolving all links and variables. - cli: Added
cache show
command to get information about file caches used by DipDup. - config: Added
first_level
andlast_level
optional fields toTemplateIndexConfig
. These limits are applied after ones from the template itself. - config: Added
daemon
boolean field toJobConfig
to run a single callback indefinitely. Conflicts withcrontab
andinterval
fields. - config: Added
advanced
top-level section.
Fixed
- cli: Fixed crashes and output inconsistency when piping DipDup commands.
- cli: Fixed
schema wipe --immune
flag being ignored. - codegen: Fixed missing imports in handlers generated during init.
- coinbase: Fixed possible data inconsistency caused by caching enabled for method
get_candles
. - http: Fixed increasing sleep time between failed request attempts.
- index: Fixed invocation of head index callback.
- index: Fixed
CallbackError
raised instead ofReindexingRequiredError
in some cases. - tzkt: Fixed resubscribing when realtime connectivity is lost for a long time.
- tzkt: Fixed sending useless subscription requests when adding indexes in runtime.
- tzkt: Fixed
get_originated_contracts
andget_similar_contracts
methods whose output was limited toHTTPConfig.batch_size
field. - tzkt: Fixed lots of SignalR bugs by replacing
aiosignalrcore
library withpysignalr
.
Changed
- cli:
dipdup schema wipe
command now requires confirmation when invoked in the interactive shell. - cli:
dipdup schema approve
command now also causes a recalculation of schema and index config hashes. - index: DipDup will recalculate respective hashes if reindexing is triggered with
config_modified: ignore
orschema_modified: ignore
in advanced config.
Deprecated
- cli:
run --oneshot
option is deprecated and will be removed in the next major release. The oneshot mode applies automatically whenlast_level
field is set in the index config. - cli:
clear-cache
command is deprecated and will be removed in the next major release. Usecache clear
command instead.
Performance
- config: Configuration files are loaded 10x times faster.
- index: Number of operations processed by matcher reduced by 40%-95% depending on the number of addresses and entrypoints used.
- tzkt: Rate limit was increased. Try to set
connection_timeout
to a higher value if requests fail withConnectionTimeout
exception. - tzkt: Improved performance of response deserialization.
3.1.3 - 2021-11-15
Fixed
- codegen: Fixed missing imports in operation handlers.
- codegen: Fixed invalid imports and arguments in big_map handlers.
3.1.2 - 2021-11-02
Fixed
- Fixed crash occurred during synchronization of big map indexes.
3.1.1 - 2021-10-18
Fixed
- Fixed loss of realtime subscriptions occurred after TzKT API outage.
- Fixed updating schema hash in
schema approve
command. - Fixed possible crash occurred while Hasura is not ready.
3.1.0 - 2021-10-12
Added
- New index class
HeadIndex
(configuration:dipdup.config.HeadIndexConfig
). Use this index type to handle head (limited block header content) updates. This index type is realtime-only: historical data won't be indexed during the synchronization stage. - Added three new commands:
schema approve
,schema wipe
, andschema export
. Rundipdup schema --help
command for details.
Changed
- Triggering reindexing won't lead to dropping the database automatically anymore.
ReindexingRequiredError
is raised instead.--forbid-reindexing
option has become default. --reindex
option is removed. Usedipdup schema wipe
instead.- Values of
dipdup_schema.reindex
field updated to simplify querying database. Seedipdup.enums.ReindexingReason
class for possible values.
Fixed
- Fixed
ReindexRequiredError
not being raised when running DipDup after reindexing was triggered. - Fixed index config hash calculation. Hashes of existing indexes in a database will be updated during the first run.
- Fixed issue in
BigMapIndex
causing the partial loss of big map diffs. - Fixed printing help for CLI commands.
- Fixed merging storage which contains specific nested structures.
Improved
- Raise
DatabaseConfigurationError
exception when project models are not compatible with GraphQL. - Another bunch of performance optimizations. Reduced DB pressure, speeded up parallel processing lots of indexes.
- Added initial set of performance benchmarks (run:
./scripts/run_benchmarks.sh
)
3.0.4 - 2021-10-04
Improved
- A significant increase in indexing speed.
Fixed
- Fixed unexpected reindexing caused by the bug in processing zero- and single-level rollbacks.
- Removed unnecessary file IO calls that could cause
PermissionError
exception in Docker environments. - Fixed possible violation of block-level atomicity during realtime indexing.
Changes
- Public methods of
TzktDatasource
now return immutable sequences.
3.0.3 - 2021-10-01
Fixed
- Fixed processing of single-level rollbacks emitted before rolled back head.
3.0.2 - 2021-09-30
Added
- Human-readable
CHANGELOG.md
🕺 - Two new options added to
dipdup run
command:--forbid-reindexing
– raiseReindexingRequiredError
instead of truncating database when reindexing is triggered for any reason. To continue indexing with existing database runUPDATE dipdup_schema SET reindex = NULL;
--postpone-jobs
– job scheduler won't start until all indexes are synchronized.
Changed
- Migration to this version requires reindexing.
dipdup_index.head_id
foreign key removed.dipdup_head
table still contains the latest blocks from Websocket received by each datasource.
Fixed
- Removed unnecessary calls to TzKT API.
- Fixed removal of PostgreSQL extensions (
timescaledb
,pgcrypto
) by functiontruncate_database
triggered on reindex. - Fixed creation of missing project package on
init
. - Fixed invalid handler callbacks generated on
init
. - Fixed detection of existing types in the project.
- Fixed race condition caused by event emitter concurrency.
- Capture unknown exceptions with Sentry before wrapping to
DipDupError
. - Fixed job scheduler start delay.
- Fixed processing of reorg messages.
3.0.1 - 2021-09-24
Added
- Added
get_quote
andget_quotes
methods toTzKTDatasource
.
Fixed
- Defer spawning index datasources until initial sync is complete. It helps to mitigate some WebSocket-related crashes, but initial sync is a bit slower now.
- Fixed possible race conditions in
TzKTDatasource
. - Start
jobs
scheduler after all indexes sync with a current head to speed up indexing.
Release notes
This section contains information about changes introduced with specific DipDup releases.
5.1.0
⚠ Migration from 5.0 (optional)
- Run
init
command. Now you have two conflicting hooks:on_rollback
andon_index_rollback
. Follow the guide below to perform the migration.ConflictingHooksError
exception will be raised until then.
What's New
Per-index rollback hook
In this release, we continue to improve the rollback-handling experience, which became much more important since the Ithaca protocol reached mainnet. Let's briefly recap how DipDup currently processes chain reorgs before calling a rollback hook:
- If the
buffer_size
option of a TzKT datasource is set to a non-zero value, and there are enough data messages buffered when a rollback occurs, data is just dropped from the buffer, and indexing continues. - If all indexes in the config are
operation
ones, we can attempt to process a single-level rollback. All operations from rolled back block must be presented in the next one for rollback to succeed. If some operations are missing, theon_rollback
hook will be called as usual. - Finally, we can safely ignore indexes with a level lower than the rollback target. The index level is updated either on synchronization or when at least one related operation or bigmap diff has been extracted from a realtime message.
If none of these tricks have worked, we can't process a rollback without custom logic. Here's where changes begin. Before this release, every project contained the on_rollback
hook, which receives datasource: IndexDatasource
argument and from/to levels. Even if your deployment has thousands of indexes and only a couple of them are affected by rollback, you weren't able to easily find out which ones.
Now on_rollback
hook is deprecated and superseded by the on_index_rollback
one. Choose one of the following options:
- You haven't touched the
on_rollback
hook since project creation. Runinit
command and removehooks/on_rollback
andsql/on_rollback
directories in project root. Default action (reindexing) has not changed. - You have some custom logic in
on_rollback
hook and want to leave it as-is for now. You can ignore introduced changes at least till the next major release. - You have implemented per-datasource rollback logic and are ready to switch to the per-index one. Run
init
, move your code to theon_index_rollback
hook and deleteon_rollback
one. Note, you can access rolled back datasource viaindex.datasource
.
Token transfer index
Sometimes implementing an operation
index is overkill for a specific task. An existing alternative is to use a big_map
index to process only the diffs of selected big map paths. However, you still need to have a separate index for each contract of interest, which is very resource-consuming. A widespread case is indexing FA1.2/FA2 token contracts. So, this release introduces a new token_transfer
index:
indexes:
transfers:
kind: token_transfer
datasource: tzkt
handlers:
- callback: transfers
The TokenTransferData
object is passed to the handler on each operation, containing only information enough to process a token transfer.
config env
command to generate env-files
Generally, It's good to separate a project config from deployment parameters, and DipDup has multiple options to achieve this. First of all, multiple configs can be chained successively, overriding top-level sections. Second, the DipDup config can contain docker-compose-style environment variable declarations. Let's say your config contains the following content:
database:
kind: postgres
host: db
port: 5432
user: ${POSTGRES_USER:-dipdup}
password: ${POSTGRES_PASSWORD:-changeme}
database: ${POSTGRES_DB:-dipdup}
You can generate an env-file to use with this exact config:
$ dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=dipdup
The environment of your current shell is also taken into account:
$ POSTGRES_DB=foobar dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=foobar # <- set from current env
Use -f <filename>
option to save output on disk instead of printing to stdout. After you have modified the env-file according to your needs, you can apply it the way which is more convenient to you:
With dipdup --env-file / -e
option:
dipdup -e prod.env <...> run
When using docker-compose:
services:
indexer:
...
env_file: prod.env
Keeping framework up-to-date
A bunch of new tags is now pushed to the Docker Hub on each release in addition to the X.Y.Z
one: X.Y
and X
. That way, you can stick to a specific release without the risk of leaving a minor/major update unattended (friends don't let friends use latest
😉). The -pytezos
flavor is also available for each tag.
FROM dipdup/dipdup:5.1
...
In addition, DipDup will poll GitHub for new releases on each command which executes reasonably long and print a warning when running an outdated version. You can disable these checks with advanced.skip_version_check
flag.
Pro tip: you can also enable notifications on the GitHub repo page with 👁 Watch -> Custom -> tick Releases -> Apply to never miss a fresh DipDup release.
Changelog
See full 5.1.0 changelog here.
5.0.0
⚠ Breaking Changes
- Python versions 3.8 and 3.9 are no longer supported.
bcd
datasource has been removed.- Two internal tables were added,
dipdup_contract_metadata
anddipdup_token_metadata
. - Some methods of
tzkt
datasource have changed their signatures and behavior. - Dummy
advanced.oneshot
config flag has been removed. - Dummy
schema approve --hashes
command flag has been removed. docker init
command has been removed.ReindexingReason
enumeration items have been changed.
⚠ Migration from 4.x
- Ensure that you have a
python = "^3.10"
dependency inpyproject.toml
. - Remove
bcd
datasources from config. Usemetadata
datasource instead to fetch contract and token metadata. - Update
tzkt
datasource method calls as described below. - Run the
dipdup schema approve
command on every database you use with 5.0.0. - Update usage of
ReindexingReason
enumeration if needed.
What's New
Process realtime messages with lag
Chain reorgs have occurred much recently since the Ithaca protocol reached mainnet. The preferable way to deal with rollbacks is the on_rollback
hook. But if the logic of your indexer is too complex, you can buffer an arbitrary number of levels before processing to avoid reindexing.
datasources:
tzkt_mainnet:
kind: tzkt
url: https://api.tzkt.io
buffer_size: 2
DipDup tries to remove backtracked operations from the buffer instead emitting rollback. Ithaca guarantees operations finality after one block and blocks finality after two blocks, so to completely avoid reorgs, buffer_size
should be 2.
BCD API takedown
Better Call Dev API was officially deprecated in February. Thus, it's time to go for bcd
datasource. In DipDup, it served the only purpose of fetching contract and token metadata. Now there's a separate metadata
datasource which do the same thing but better. If you have used bcd
datasource for custom requests, see How to migrate from BCD to TzKT API article.
TzKT batch request pagination
Historically, most TzktDatasource
methods had a page iteration logic hidden inside. The quantity of items returned by TzKT in a single request is configured in HTTPConfig.batch_size
and defaulted to 10.000. Before this release, three requests would be performed by the get_big_map
method to fetch 25.000 big map keys, leading to performance degradation and extensive memory usage.
affected method | response size in 4.x | response size in 5.x |
---|---|---|
get_similar_contracts | unlimited | max. datasource.request_limit |
get_originated_contracts | unlimited | max. datasource.request_limit |
get_big_map | unlimited | max. datasource.request_limit |
get_contract_big_maps | unlimited | max. datasource.request_limit |
get_quotes | first datasource.request_limit | max. datasource.request_limit |
All paginated methods now behave the same way. You can either iterate over pages manually or use iter_...
helpers.
datasource = ctx.get_tzkt_datasource('tzkt_mainnet')
batch_iter = datasource.iter_big_map(
big_map_id=big_map_id,
level=last_level,
)
async for key_batch in batch_iter:
for key in key_batch:
...
Metadata interface for TzKT integration
Starting with 5.0 you can store and expose custom contract and token metadata in the same format DipDup Metadata service does for TZIP-compatible metadata.
Enable this feature with advanced.metadata_interface
flag, then update metadata in any callback:
await ctx.update_contract_metadata(
network='mainnet',
address='KT1...',
metadata={'foo': 'bar'},
)
Metadata stored in dipdup_contract_metadata
and dipdup_token_metadata
tables and available via GraphQL and REST APIs.
Prometheus integration
This version introduces initial Prometheus integration. It could help you set up monitoring, find performance issues in your code, and so on. To enable this integration, add the following lines to the config:
prometheus:
host: 0.0.0.0
🤓 SEE ALSO
Changes singe 4.2.7
Added
- config: Added
custom
section to store arbitrary user data. - metadata: Added
metadata_interface
feature flag to expose metadata in TzKT format. - prometheus: Added ability to expose Prometheus metrics.
- tzkt: Added ability to process realtime messages with lag.
- tzkt: Added missing fields to the
HeadBlockData
model. - tzkt: Added
iter_...
methods to iterate over item batches.
Fixed
- config: Fixed default SQLite path (
:memory:
). - prometheus: Fixed invalid metric labels.
- tzkt: Fixed pagination in several getter methods.
- tzkt: Fixed data loss when
skip_history
option is enabled. - tzkt: Fixed crash in methods that do not support cursor pagination.
- tzkt: Fixed possible OOM while calling methods that support pagination.
- tzkt: Fixed possible data loss in
get_originations
andget_quotes
methods.
Changed
- tzkt: Added
offset
andlimit
arguments to all methods that support pagination.
Removed
- bcd: Removed
bcd
datasource and config section. - cli: Removed
docker init
command. - cli: Removed dummy
schema approve --hashes
flag. - config: Removed dummy
advanced.oneshot
flag.
Performance
- dipdup: Use fast
orjson
library instead of built-injson
where possible.
4.2.0
What's new
ipfs
datasource
While working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup now has a separate datasource to perform such requests.
datasources:
ipfs:
kind: ipfs
url: https://ipfs.io/ipfs
You can use this datasource within any callback. Output is either JSON or binary data.
ipfs = ctx.get_ipfs_datasource('ipfs')
file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'
file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'
You can tune HTTP connection parameters with the http
config field, just like any other datasource.
Sending arbitrary requests
DipDup datasources do not cover all available methods of underlying APIs. Let's say you want to fetch protocol of the chain you're currently indexing from TzKT:
tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
method='get',
url='v1/protocols/current',
cache=False,
weigth=1, # ratelimiter leaky-bucket drops
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'
Datasource HTTP connection parameters (ratelimit, backoff, etc.) are applied on every request.
Firing hooks outside of the current transaction
When configuring a hook, you can instruct DipDup to wrap it in a single database transaction:
hooks:
my_hook:
callback: my_hook
atomic: True
Until now, such hooks could only be fired according to jobs
schedules, but not from a handler or another atomic hook using ctx.fire_hook
method. This limitation is eliminated - use wait
argument to escape the current transaction:
async def handler(ctx: HandlerContext, ...) -> None:
await ctx.fire_hook('atomic_hook', wait=False)
Spin up a new project with a single command
Cookiecutter is an excellent jinja2
wrapper to initialize hello-world templates of various frameworks and toolkits interactively. Install python-cookiecutter
package systemwide, then call:
cookiecutter https://github.com/dipdup-net/cookiecutter-dipdup
Advanced scheduler configuration
DipDup utilizes apscheduler
library to run hooks according to schedules in jobs
config section. In the following example, apscheduler
spawns up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:
advanced:
scheduler:
apscheduler.job_defaults.coalesce: True
apscheduler.job_defaults.max_instances: 3
See apscheduler
docs for details.
Note that you can't use executors from apscheduler.executors.pool
module - ConfigurationError
exception raised then. If you're into multiprocessing, I'll explain why in the next paragraph.
About the present and future of multiprocessing
It's impossible to use apscheduler
pool executors with hooks because HookContext
is not pickle-serializable. So, they are forbidden now in advanced.scheduler
config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in <project>/cli.py
:
from contextlib import AsyncExitStack
import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper
@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
config: DipDupConfig = ctx.obj.config
url = config.database.connection_string
models = f'{config.package}.models'
async with AsyncExitStack() as stack:
await stack.enter_async_context(tortoise_wrapper(url, models))
...
if __name__ == '__main__':
cli(prog_name='dipdup', standalone_mode=False) # type: ignore
Then use python -m <project>.cli
instead of dipdup
as an entrypoint. Now you can call do-something-heavy
like any other dipdup
command. dipdup.cli:cli
group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run
as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply
and ctx.pool_map
methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.
That's all, folks! As always, your feedback is very welcome 🤙
4.1.0
⚠ Migration from 4.0 (optional)
- Run
dipdup schema init
on the existing database to enabledipdup_head_status
view and REST endpoint.
What's New
Index only the current state of big maps
big_map
indexes allow achieving faster processing times than operation
ones when storage updates are the only on-chain data your dapp needs to function. With this DipDup release, you can go even further and index only the current storage state, ignoring historical changes.
indexes:
foo:
kind: big_map
...
skip_history: never|once|always
When this option is set to once
, DipDup will skip historical changes only on initial sync and switch to regular indexing afterward. When the value is always
, DipDup will fetch all big map keys on every restart. Preferrable mode depends on your workload.
All big map diffs DipDup pass to handlers during fast sync have action
field set to BigMapAction.ADD_KEY
. Keep in mind that DipDup fetches all keys in this mode, including ones removed from the big map. If needed, you can filter out the latter by BigMapDiff.data.active
field.
New datasource for contract and token metadata
Since the first version DipDup allows to fetch token metadata from Better Call Dev API with bcd
datasource. Now it's time for a better solution. Firstly, BCD is far from being reliable in terms of metadata indexing. Secondly, spinning up your own instance of BCD requires significant effort and computing power. Lastly, we plan to deprecate Better Call Dev API soon (but do not worry - it won't affect the explorer frontend).
Luckily, we have dipdup-metadata, a standalone companion indexer for DipDup written in Go. Configure a new datasource in the following way:
datasources:
metadata:
kind: metadata
url: https://metadata.dipdup.net
network: mainnet|handzhounet
Now you can use it anywhere in your callbacks:
datasource = ctx.datasources['metadata']
token_metadata = await datasource.get_token_metadata(address, token_id)
bcd
datasource will remain available for a while, but we discourage using it for metadata processing.
Nested packages for hooks and handlers
Callback modules are no longer have to be in top-level hooks
/handlers
directories. Add one or multiple dots to the callback name to define nested packages:
package: indexer
hooks:
foo.bar:
callback: foo.bar
After running init
command, you'll get the following directory tree (shortened for readability):
indexer
├── hooks
│ ├── foo
│ │ ├── bar.py
│ │ └── __init__.py
│ └── __init__.py
└── sql
└── foo
└── bar
└── .keep
The same rules apply to handler callbacks. Note that callback
field must be a valid Python package name - lowercase letters, underscores, and dots.
New CLI commands and flags
-
schema init
is a new command to prepare a database for running DipDip. It will create tables based on your models, then callon_reindex
SQL hook to finish preparation - the same things DipDup does when run on a clean database. -
hasura configure --force
flag allows to configure Hasura even if metadata hash matches one saved in database. It may come in handy during development. -
init --keep-schemas
flag makes DipDup preserve contract JSONSchemas. Usually, they are removed after generating typeclasses withdatamodel-codegen
, but you can keep them to convert to other formats or troubleshoot codegen issues.
Built-in dipdup_head_status
view and REST endpoint
DipDup maintains several internal models to keep its state. As Hasura generates GraphQL queries and REST endpoints for those models, you can use them for monitoring. However, some SaaS monitoring solutions can only check whether an HTTP response contains a specific word or not. For such cases dipdup_head_status
view was added - a simplified representation of dipdup_head
table. It returns OK
when datasource received head less than two minutes ago and OUTDATED
otherwise. Latter means that something's stuck, either DipDup (e.g., because of database deadlock) or TzKT instance. Or maybe the whole Tezos blockchain, but in that case, you have problems bigger than indexing.
$ curl "http://127.0.0.1:41000/api/rest/dipdupHeadStatus?name=https%3A%2F%2Fapi.tzkt.io"
{"dipdupHeadStatus":[{"status":"OUTDATED"}]}%
Note that dipdup_head
update may be delayed during sync even if the --early-realtime
flag is enabled, so don't rely exclusively on this endpoint.
Changelog
Added
- cli: Added
schema init
command to initialize database schema. - cli: Added
--force
flag tohasura configure
command. - codegen: Added support for subpackages inside callback directories.
- hasura: Added
dipdup_head_status
view and REST endpoint. - index: Added an ability to skip historical data while synchronizing
big_map
indexes. - metadata: Added
metadata
datasource. - tzkt: Added
get_big_map
andget_contract_big_maps
datasource methods.
4.0.0
⚠ Breaking Changes
run --oneshot
option is removed. The oneshot mode (DipDup stops after the sync is finished) applies automatically whenlast_level
field is set in the index config.clear-cache
command is removed. Usecache clear
instead.
⚠ Migration from 3.x
- Run
dipdup init
command to generateon_synchronized
hook stubs. - Run
dipdup schema approve
command on every database you want to use with 4.0.0. Runningdipdup migrate
is not necessary sincespec_version
hasn't changed in this release.
What's New
Performance optimizations
Overall indexing performance has been significantly improved. Key highlights:
- Configuration files are loaded 10x times faster. The more indexes in the project, the more noticeable difference is.
- Significantly reduced CPU usage in realtime mode.
- Datasource default HTTP connection options optimized for a reasonable balance between resource consumption and indexing speed.
Also, two new flags were added to improve DipDup performance in several scenarios: merge_subscriptions
and early_relatime
. See this paragraph for details.
Configurable action on reindex
There are several reasons that trigger reindexing:
reason | description |
---|---|
manual | Reindexing triggered manually from callback with ctx.reindex . |
migration | Applied migration requires reindexing. Check release notes before switching between major DipDup versions to be prepared. |
rollback | Reorg message received from TzKT, and can not be processed. |
config_modified | One of the index configs has been modified. |
schema_modified | Database schema has been modified. Try to avoid manual schema modifications in favor of SQL hooks. |
Now it is possible to configure desirable action on reindexing triggered by the specific reason.
action | description |
---|---|
exception (default) | Raise ReindexingRequiredError and quit with error code. The safest option since you can trigger reindexing accidentally, e.g., by a typo in config. Don't forget to set up the correct restart policy when using it with containers. |
wipe | Drop the whole database and start indexing from scratch. Be careful with this option! |
ignore | Ignore the event and continue indexing as usual. It can lead to unexpected side-effects up to data corruption; make sure you know what you are doing. |
To configure actions for each reason, add the following section to DipDup config:
advanced:
...
reindex:
manual: wipe
migration: exception
rollback: ignore
config_modified: exception
schema_modified: exception
New CLI commands and flags
command or flag | description |
---|---|
cache show | Get information about file caches used by DipDup. |
config export | Print config after resolving all links and variables. Add --unsafe option to substitute environment variables; default values from config will be used otherwise. |
run --early-realtime | Establish a realtime connection before all indexes are synchronized. |
run --merge-subscriptions | Subscribe to all operations/big map diffs during realtime indexing. This flag helps to avoid reaching TzKT subscriptions limit (currently 10000 channels). Keep in mind that this option could significantly improve RAM consumption depending on the time required to perform a sync. |
status | Print the current status of indexes from the database. |
advanced
top-level config section
This config section allows users to tune system-wide options, either experimental or unsuitable for generic configurations.
field | description |
---|---|
early_realtime merge_subscriptions postpone_jobs | Another way to set run command flags. Useful for maintaining per-deployment configurations. |
reindex | Configure action on reindexing triggered. See this paragraph for details. |
CLI flags have priority over self-titled AdvancedConfig
fields.
aiosignalrcore
replaced with pysignalr
It may not be the most noticeable improvement for end-user, but it still deserves a separate paragraph in this article.
Historically, DipDup used our own fork of signalrcore
library named aiosignalrcore
. This project aimed to replace the synchronous websocket-client
library with asyncio-ready websockets
. Later we discovered that required changes make it hard to maintain backward compatibility, so we have decided to rewrite this library from scratch. So now you have both a modern and reliable library for SignalR protocol and a much more stable DipDup. Ain't it nice?
Changes since 3.1.3
This is a combined changelog of -rc versions released since the last stable release until this one.
Added
- cli: Added
run --early-realtime
flag to establish a realtime connection before all indexes are synchronized. - cli: Added'run --merge-subscriptions` flag to subscribe to all operations/big map diffs during realtime indexing.
- cli: Added
status
command to print the current status of indexes from the database. - cli: Added
config export [--unsafe]
command to print config after resolving all links and variables. - cli: Added
cache show
command to get information about file caches used by DipDup. - config: Added
first_level
andlast_level
optional fields toTemplateIndexConfig
. These limits are applied after ones from the template itself. - config: Added
daemon
boolean field toJobConfig
to run a single callback indefinitely. Conflicts withcrontab
andinterval
fields. - config: Added
advanced
top-level section. - hooks: Added
on_synchronized
hook, which fires each time all indexes reach realtime state.
Fixed
- cli: Fixed config not being verified when invoking some commands.
- cli: Fixed crashes and output inconsistency when piping DipDup commands.
- cli: Fixed missing
schema approve --hashes
argument. - cli: Fixed
schema wipe --immune
flag being ignored. - codegen: Fixed contract address used instead of an alias when typename is not set.
- codegen: Fixed generating callback arguments for untyped operations.
- codegen: Fixed missing imports in handlers generated during init.
- coinbase: Fixed possible data inconsistency caused by caching enabled for method
get_candles
. - hasura: Fixed unnecessary reconfiguration in restart.
- http: Fixed increasing sleep time between failed request attempts.
- index: Fixed
CallbackError
raised instead ofReindexingRequiredError
in some cases. - index: Fixed crash while processing storage of some contracts.
- index: Fixed incorrect log messages, remove duplicate ones.
- index: Fixed invocation of head index callback.
- index: Fixed matching of untyped operations filtered by
source
field (@pravin-d). - tzkt: Fixed filtering of big map diffs by the path.
- tzkt: Fixed
get_originated_contracts
andget_similar_contracts
methods whose output was limited toHTTPConfig.batch_size
field. - tzkt: Fixed lots of SignalR bugs by replacing
aiosignalrcore
library withpysignalr
. - tzkt: Fixed processing operations with entrypoint
default
. - tzkt: Fixed regression in processing migration originations.
- tzkt: Fixed resubscribing when realtime connectivity is lost for a long time.
- tzkt: Fixed sending useless subscription requests when adding indexes in runtime.
Changed
- cli:
schema wipe
command now requires confirmation when invoked in the interactive shell. - cli:
schema approve
command now also causes a recalculation of schema and index config hashes. - index: DipDup will recalculate respective hashes if reindexing is triggered with
config_modified: ignore
orschema_modified: ignore
in advanced config.
Removed
- cli: Removed deprecated
run --oneshot
argument andclear-cache
command.
Performance
- config: Configuration files are loaded 10x times faster.
- index: Checks performed on each iteration of the main DipDup loop are slightly faster now.
- index: Number of operations processed by matcher reduced by 40%-95% depending on the number of addresses and entrypoints used.
- tzkt: Improved performance of response deserialization.
- tzkt: Rate limit was increased. Try to set
connection_timeout
to a higher value if requests fail withConnectionTimeout
exception.