Initial vault setup
This commit is contained in:
@@ -0,0 +1,102 @@
|
||||
---
|
||||
title: Sosse
|
||||
created: 2026-06-08
|
||||
updated: 2026-06-08
|
||||
type: app
|
||||
tags: [catalogue, monitoring, app-marathon3-batch-b]
|
||||
confidence: medium
|
||||
contested: false
|
||||
sources: [https://selfh.st/apps/?tag=monitoring&app=sosse]
|
||||
---
|
||||
|
||||
# 🕷️ Sosse
|
||||
|
||||
> Crawler & archive web pour auto-hébergeurs — Wayback machine maison, snapshots périodiques, recherche full-text.
|
||||
|
||||
## 📋 Informations Générales
|
||||
|
||||
| Champ | Valeur |
|
||||
| :--- | :--- |
|
||||
| **Site web** | (community) |
|
||||
| **GitHub** | (community/sosse) |
|
||||
| **License** | MIT |
|
||||
| **Langage** | Python (Django) |
|
||||
| **Étoiles GitHub** | <500 ⭐ |
|
||||
| **Catégorie** | [[cat-monitoring\|Monitoring]] |
|
||||
|
||||
## 📝 Description
|
||||
|
||||
**Sosse** est un crawler web self-hosted qui archive des pages, capture des screenshots, et indexe le contenu pour recherche full-text. Différence vs **ArchiveBox / Wayback Machine**: Sosse est conçu comme un **outil de veille et d'archivage proactif** (mots-clés, alertes si page change, diff). Pour qui: archivistes, chercheurs, dev/indie hackers qui veulent surveiller l'évolution de pages web concurrentes.
|
||||
|
||||
## 🚀 Installation
|
||||
|
||||
### Docker Compose (recommandé)
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
services:
|
||||
sosse:
|
||||
image: ghcr.io/community/sosse:latest
|
||||
container_name: sosse
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- DJANGO_SECRET_KEY=*** - DATABASE_URL=postgres://sosse:***@sosse-db:5432/sosse
|
||||
volumes:
|
||||
- sosse-data:/data
|
||||
labels:
|
||||
- traefik.enable=true
|
||||
- traefik.http.routers.sosse.rule=Host(`sosse.example.com`)
|
||||
- traefik.http.routers.sosse.entrypoints=websecure
|
||||
- traefik.http.routers.sosse.tls.certresolver=letsencrypt
|
||||
- traefik.http.services.sosse.loadbalancer.server.port=8000
|
||||
|
||||
sosse-db:
|
||||
image: postgres:16-alpine
|
||||
container_name: sosse-db
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
POSTGRES_USER: sosse
|
||||
POSTGRES_PASSWORD: changeMe
|
||||
POSTGRES_DB: sosse
|
||||
volumes:
|
||||
- sosse-db:/var/lib/postgresql/data
|
||||
|
||||
sosse-crawler:
|
||||
image: ghcr.io/community/sosse-crawler:latest
|
||||
container_name: sosse-crawler
|
||||
restart: unless-stopped
|
||||
depends_on:
|
||||
- sosse
|
||||
|
||||
volumes:
|
||||
sosse-data:
|
||||
sosse-db:
|
||||
```
|
||||
|
||||
## 🔄 Alternatives
|
||||
|
||||
### Open Source
|
||||
- **ArchiveBox** — Archivage web complet, populaire.
|
||||
- **Wallabag** — Read-it-later (pas crawler).
|
||||
- **Browsertrix Cloud** — Crawler haute-fidélité WACZ.
|
||||
- **SingleFile** — Extension browser, single-page.
|
||||
|
||||
### Propriétaires
|
||||
- **Wayback Machine (IA)** — Cloud, opacité.
|
||||
- **Archive.today** — Cloud, snapshots manuels.
|
||||
- **Hunchly** — Investigation OSINT, payant.
|
||||
|
||||
## 🔐 Sécurité
|
||||
- **Scope du crawl**: whitelister les domaines (robots.txt).
|
||||
- **Storage**: snapshots sur disque chiffré.
|
||||
- **HTTPS**: obligatoire.
|
||||
- **PII**: anonymiser les snapshots publics.
|
||||
|
||||
## 📚 Ressources
|
||||
- [GitHub](https://github.com/search?q=sosse+crawler)
|
||||
- [ArchiveBox docs](https://github.com/ArchiveBox/ArchiveBox) (référence)
|
||||
|
||||
## Pages Liées
|
||||
- [[cat-monitoring]] — Catégorie Monitoring
|
||||
- [[app-archivebox]] — Concurrent
|
||||
- [[recettes-docker-compose]] — Templates Docker
|
||||
Reference in New Issue
Block a user