Privacy Policy
Dimtse — Privacy Notice
Effective date: _to be set at launch_ · Version: v1 (draft)
Dimtse ("Dimtse", "the platform", "we", "us") is a contributor platform for building
open speech and language resources for East African languages. Dimtse is operated by
Samic Ventures LLC, a limited liability company organized in the State of Wyoming,
United States ("the operator"). This notice explains what we collect from contributors,
why, who it is shared with, and the rights you have over it. It is written to be read,
not to be skimmed past — if anything here is unclear, contact us before you contribute.
This notice covers contributors (people who record their voice, rate audio, correct
transcripts, or submit text on Dimtse). It does not cover end-users of products
built by the operator (for example the Dewul phone service), which have their own
notices.
1. What we collect
When you contribute, we collect only what the work requires:
| Data | Why we collect it |
|---|---|
| Voice recordings | The core contribution — clips of you reading prompts, used to build open speech datasets and to train speech (ASR/TTS) models. |
| Ratings | Your 1–5 scores of audio naturalness / intelligibility, used to evaluate model quality (MOS panels). |
| Transcript corrections / text | Your fixes to machine transcripts, and any text you author, used to create gold-standard training pairs. |
| Display name | Shown in your contributor profile and, if you opt in, in dataset attribution. You may use a pseudonym. |
| Language, country (optional) | To balance datasets across dialects and to report fairness metrics. |
| Payout method + payout handle | E.g. "telebirr" plus the phone number or account the money goes to, so we can pay you. This is the most sensitive item we hold — see §5. |
| Basic operational logs | Timestamps and counts of accepted items, to compute what we owe you and to prevent abuse. |
We do not collect government IDs, precise location, contacts, or any data beyond
the table above. We do not use third-party advertising trackers.
2. The two consent scopes
Before you record or rate anything, Dimtse shows you a consent screen. Your agreement
covers two distinct, clearly-stated uses, and recording does not begin until you
agree:
1. Open dataset release under CC-BY-SA-4.0. Your accepted recordings, ratings, and
text corrections may be published as part of an open dataset under the
Creative Commons Attribution-ShareAlike 4.0
license. This means others may reuse and build on the data, including commercially,
as long as they credit the source and share derivatives under the same license.
2. Use for AI model training. The same accepted contributions may be used to train,
fine-tune, and evaluate speech and language models (ASR, TTS, NLU) — including models
that the operator may license openly and commercially (for example, the voice
used by the Dewul service).
Both scopes are presented together and recorded with a version stamp so we always know
exactly what you agreed to. Agreeing to contribute means agreeing to both; if you are
not comfortable with either, please do not contribute.
3. Fair pay
Contribution on Dimtse is paid work, not a donation.
- You are paid a stated rate per accepted item (per voice recording, per rating),
- Rates are set to be fair for the local cost of living — benchmarked to sit above
- The rate that applied when you contributed is the rate you are paid; we do not
- We compute what you are owed from your accepted-item counts and pay out via the method
shown to you before you start.
the applicable local minimum wage and in line with ethical research-participant norms
(Masakhane principles), not driven to the cheapest possible number.
retroactively lower it.
and handle you provide.
4. Who we share data with
- Open dataset recipients (the public). Only the **de-identified, contribution
- The operator's model-training pipeline (internal), under the consent scopes above.
- Payment processors / mobile-money providers, only the minimum needed to pay you.
- We do not sell your personal data, and we do not share it with advertisers.
content** you consented to release (audio, ratings, corrected text) plus, if you opted
in, your chosen attribution name. Never your payout handle, phone number, or any
contact detail.
5. How we protect your payout handle and contact details
Your payout handle is the one piece of directly-identifying data we must keep, and
we treat it accordingly:
- It is stored encrypted at rest and is accessible only to the payout process.
- It is never included in any dataset, model, export, or public artifact.
- It is never released, even in aggregate.
- Access is limited to operating the payout, and it is deleted on request (see §7).
6. De-identification
Before any contribution leaves the platform for a dataset or model, we remove direct
identifiers from text:
- Phone numbers, email addresses, and long digit runs (account/card/ID numbers) are
- Obvious self-introduced names in transcripts are redacted.
- Recordings are referenced by an internal key, never by your name or number.
stripped and replaced with placeholders.
De-identification is applied in code, automatically, as a gate the data must pass
through — not as a manual afterthought. (For data originating from the operator's Dewul
phone service, additional, stricter rules apply: raw call audio and personal transcripts
are never publicly released — only de-identified, derived data, and only with caller
consent. See `docs/DATA_GOVERNANCE.md`.)
7. Your rights: withdrawal and deletion
You stay in control of your contributions:
- Withdraw at any time. You may stop contributing whenever you like; this does not
- Right to deletion. You may ask us to delete your contributions and your personal
- remove your recordings, ratings, and text from our active store, and
- purge them from all derived datasets and model-training inputs at the next build,
- Important limits, stated honestly:
- Open datasets already published under CC-BY-SA-4.0 cannot be recalled from people
- A model that was already trained on your data before withdrawal cannot have that
- Access / correction. You may ask what we hold about you and have it corrected.
affect pay already earned for accepted items.
details (display name, payout handle, country). On a verified request we:
so withdrawn contributions stop flowing into future releases.
who already downloaded them — that is inherent to an open license. We remove your
data from the *next* release and our copies; we cannot un-distribute past copies.
single contribution surgically removed; we exclude your data from subsequent
training and releases.
To exercise any of these, contact us (§9). We aim to respond within 30 days.
8. Retention
- Payout handle / contact details: kept only as long as needed to pay you and meet
- Contributions: kept while the project is active or until you withdraw, subject to
- Operational logs: kept for a limited period for accounting and abuse-prevention.
legal/accounting obligations, then deleted.
the open-license limit in §7.
9. Contact and governing law
- Operator: Samic Ventures LLC, Wyoming, USA.
- Contact: privacy@dimtse.ai (placeholder — set the real address at launch).
- Governing law: this notice is governed by the laws of the State of Wyoming, USA,
without regard to its conflict-of-laws rules. Where mandatory local data-protection law
applies to you as a contributor, we honor the stronger protection.
10. African data-ethics commitments
Dimtse exists to build language resources with East African communities, not to
extract from them. We commit to:
- Fair wage for contribution (§3).
- Attribution option — you may choose to be credited as a contributor in released
- Community benefit — the resulting datasets are released openly (CC-BY-SA-4.0) so
- No parachute research — local co-investigators share authorship, budget, and
datasets.
the communities whose languages they represent can use and build on them, and
performance is reported across dialects and genders rather than hidden.
governance of the resulting resources.
11. Changes
If we change this notice materially, we will update the version and effective date and
notify active contributors. Your existing consent record always reflects the version you
agreed to.