Hello, I'm Alexandre

Data Scientist

I transform data into valuable insights through statistical analysis, machine learning and data visualization to support strategic decisions.

Alexandre

About Me

Get to know my journey and experience

I'm a Data Scientist passionate about discovering patterns and hidden insights in data. With experience in statistical analysis, machine learning and data visualization, I work to transform raw data into strategic information.

My approach combines scientific rigor with creativity to solve complex problems and generate real value for organizations through data-driven solutions.

9+ Completed Projects
10+ Technologies
100% Dedication
Working with data

Professional Experience

My journey and key accomplishments

Data Scientist

Independent Projects 2022 - Present
  • Delivered 9+ end-to-end data science projects focused on predictive analytics and executive storytelling.
  • Implemented ETL/ELT pipelines with Apache Airflow and dbt to process large data volumes reliably.
  • Built executive Power BI dashboards that generated actionable insights for decision-makers.
  • Trained machine learning models reaching an average of 92% accuracy on classification problems.
Python SQL Power BI Airflow Docker MongoDB

Technology Stack

Tools and platforms I master

Languages

Python
9+ years
SQL
6+ years
JavaScript
3+ years

Data Science & ML

Pandas
Expert
Scikit-learn
Advanced
TensorFlow
Intermediate
NumPy
Advanced

Visualization & BI

Power BI
Expert
Matplotlib
Advanced
Plotly
Advanced

Data Engineering

Apache Airflow
Advanced
Docker
Advanced
dbt
Advanced

Databases

PostgreSQL
Advanced
MongoDB
Advanced
MySQL
Advanced

Cloud & DevOps

Google Cloud
Intermediate
Git
Advanced
CI/CD Pipelines
Advanced

Technical Skills

How I interact with technologies in my daily work

alexandre@skillset:~$
Python
>>> if isinstance(alexandre, PythonExpert) and alexandre.experience > 5: return "Automation, analysis, and APIs? Just call me!"
Automation, analysis, and APIs? Just call me!
SQL
sql> SELECT skill, proficiency FROM expertise WHERE developer = 'alexandre' AND skill LIKE '%SQL%' HAVING proficiency = 'Advanced' /* Optimized query, indexes used! */
skill: SQL | proficiency: Advanced
Pandas
>>> df = pd.read_csv('challenge.csv') df.groupby('problem').apply(alexandre.solve) # Result: insights ready for decision-making
DataFrame transformed successfully!
Docker
$ docker build -t alexandre/solution:latest . && docker run --rm -e PROBLEM=complex alexandre/solution:latest
Container running... Problem solved!
Power BI
DAX> CALCULATE([Insights], FILTER(ALL('Projects'), [Author]="Alexandre" && [Level]="Advanced" ) )
Interactive dashboard created!
Git
$ git commit -m "feat: robust solution delivered by Alexandre" && git push origin main
[main abc123] feat: robust solution delivered by Alexandre
MongoDB
> db.skills.aggregate([ { $match: { user: "alexandre", skill: "MongoDB" } }, { $project: { proficiency: 1, pipelines: 1 } } ])
{ "proficiency": "Advanced", "pipelines": "Expert" }
Airflow
>>> with DAG('alexandre_pipeline', schedule_interval='@daily') as dag: run_etl = PythonOperator( task_id='run_etl', python_callable=alexandre.advanced_etl )
Airflow pipeline active and monitored!
Matplotlib
>>> import matplotlib.pyplot as plt plt.style.use('alexandre_custom') plt.plot(data, color='insights') plt.title('Another visualization that tells a story')
Chart saved: storytelling_with_data.png
Machine Learning
>>> from sklearn.ensemble import RandomForestClassifier model = alexandre.train_model(data, target) print(f"Accuracy: {model.score(X_test, y_test):.2%}")
Accuracy: 94.7% - Model ready for production!

Certifications

Validations of my technical competencies

Pulling the latest badges from Credly...

Verified competencies

Automatically synced from Credly

Mapping recognized skills...

Featured Projects

Demonstration of technical rigor and production

Flagship Projects

5 main projects that demonstrate technical rigor, reproducibility, and production focus

🌟 Flagship

Data Engineer Case - DataOps Pipeline

Complete data processing system with DataOps pipeline that integrates Python and MongoDB, offering full support for local, remote, and Docker environments with advanced ETL/ELT features.

Reproducible Documented Tests CI/CD Limitations

How to Run

  1. Clone the repository: git clone https://github.com/alex-des-santos/Case-Engenheiro-dados
  2. Set up MongoDB or use Docker: docker-compose up -d
  3. Run the pipeline: python main.py
  4. Access the full documentation in the README

Impact & Decisions

  • Decision: Automated ETL/ELT pipeline with data validation and multi-environment support
  • Benefit: Reliable data processing with guaranteed quality through programmatic validations
  • Trade-off: Initial setup complexity vs flexibility for local/remote/Docker environments
Python MongoDB Pandas Docker DataOps
🌟 Flagship

KPIs Governance Dashboard

Executive dashboard for data governance and strategic KPIs, with modern lakehouse architecture and automated ETL/ELT pipeline for decisions based on reliable data.

Reproducible Baseline Documented Tests Limitations

How to Run

  1. Clone: git clone https://github.com/alex-des-santos/kpis-governance-dashboard
  2. Install dependencies: npm install or pip install -r requirements.txt
  3. Run ETL/ELT pipeline (Airflow or script)
  4. Open dashboard (Power BI or Metabase)

Impact & Decisions

  • Decision: Which KPIs to prioritize for data governance and quality
  • Benefit: Centralized visibility of business metrics with guaranteed quality
  • Trade-off: Update latency vs quality assurance (currently simulated data)
Data Engineering Apache Airflow dbt Data Governance Great Expectations Power BI
🌟 Flagship

Adventure Works - Executive Data Analysis

Executive dashboard for financial analysis of Adventure Works with revenue forecasts using Machine Learning and market opportunity analysis.

Reproducible Baseline Documented Validation Limitations
94.7% Accuracy (vs. naïve baseline: 78%)
23% Growth Opportunity

How to Run

  1. Access the online dashboard or download the .pbix file
  2. Open in Power BI Desktop (free version)
  3. Update datasources if necessary
  4. Explore executive views and ML forecasts

Impact & Decisions

  • Decision: Expansion to Brazilian market based on opportunity analysis (23% potential growth)
  • Benefit: Revenue forecasts 17% more accurate than naïve forecast for strategic planning
  • Trade-off: Accuracy vs interpretability (complex model vs simple rules for executives)
Power BI Machine Learning SQL Business Intelligence DAX
🌟 Flagship

CSV Insights Tool - GenAI for Analytics

Interactive web tool that uses generative AI for automatic analysis of CSV files, generating statistical insights, visualizations, and recommendations in real time.

Reproducible Documented Evaluation Security Observability
70% Time Reduction vs Manual
<3s Processing 500MB

How to Run

  1. Access online tool (no installation)
  2. Upload CSV file (up to 500MB)
  3. Wait for automated analysis
  4. Explore AI-generated insights, visualizations, and recommendations

Impact & Decisions

  • Decision: Prioritization of exploratory analyses and quick pattern identification
  • Benefit: 70% time reduction vs manual analysis with Pandas/Excel for new datasets
  • Trade-off: Speed vs accuracy (insights may hallucinate - manual validation recommended)
JavaScript GenAI D3.js Data Visualization OpenAI
🌟 Flagship

ML Production Pipeline - Complete MLOps

End-to-end Machine Learning production pipeline with experiment tracking (MLflow), inference API (FastAPI), and containerized infrastructure. Wine quality classification with 88% F1 Score.

Reproducible Documented Tests CI/CD Observability
88.2% F1 Score
90.9% ROC AUC
12 Experiments

How to Run

  1. Clone the repository: git clone https://github.com/alex-des-santos/ml-production-pipeline
  2. Start the services: docker compose up -d
  3. Train the model: docker exec ml-api python3 train.py
  4. Access MLflow UI at localhost:5000 and API at localhost:8001/docs

Impact & Decisions

  • Decision: MLflow for full experiment tracking and model versioning
  • Benefit: Total reproducibility - any experiment can be recreated or rolled back
  • Trade-off: Infrastructure complexity vs model governance and auditability
Python FastAPI MLflow Docker scikit-learn Prometheus

Data Scientists Guide

Open guide to build a Data Science portfolio with checklists, templates, and practical references.

Mentoring Documentation Python Best Practices

Habits and Performance Analysis

Study crossing habits and performance of students using notebooks, visualizations, and statistical experiments.

Python Pandas EDA Statistics

AI Resume Optimizer

Engine that compares job openings with CVs, suggests improvements, and feeds the official Chrome extension for automated optimization.

NLP Python Career Chrome Extension

Learn Machine Learning Visually

Interactive educational platform to teach machine learning concepts through dynamic visualizations and accessible practical examples.

JavaScript D3.js Educational Tech Interactive Learning

Windows Event Log Analyzer

Modern and intuitive web tool for analyzing Windows Event Viewer logs, with intelligent insights generated by AI using Google Gemini for automatic identification of critical issues.

JavaScript Google Gemini AI CSV Analysis System Monitoring

Projects in Development

✅ Completed

End-to-End ML in Production

Complete Machine Learning pipeline from training to production: tracking with MLflow, REST API with FastAPI, Docker containerization, automated CI/CD and data drift monitoring.

MLflow • FastAPI • Docker • Scikit-learn • Pytest • GitHub Actions • Prometheus
Gap: MLOps & ML in Production
View Project GitHub
📋 Planned Q1 2026

Real Analytics Engineering

Modern lakehouse architecture: public data ingestion → dbt (staging/intermediate/marts) → Great Expectations for data quality → versioned metrics → executive dashboard with BI.

dbt Core • Great Expectations • Apache Airflow • DuckDB/BigQuery • Power BI
Gap: Data Governance & Data Contracts
📋 Planned Q1 2026

GenAI with Rigorous Evaluation

RAG system for technical documentation with traceable citations, automated evaluation (RAGAS), security hardening (prompt injection), complete telemetry and regression tests.

LangChain • ChromaDB • RAGAS • OpenAI • FastAPI • Guardrails • LangSmith
Gap: GenAI/RAG in Production with Rigor

More Projects on GitHub

Complete list of my own repositories (no forks) focused on AI, data, automation, and productivity.

Repository

agentdemo24por7

24/7 multi-agent orchestrator that triages tickets autonomously with LangGraph/LangChain state machines.

LLMs Agents Automation
Open on GitHub

Repository

alex-des-santos

Interactive README that highlights mission, principles, and professional roadmap.

Portfolio Markdown Community
Open on GitHub

Repository

analise-habitos-desempenho

Research that connects students' habits and performance using notebooks, visualizations, and statistical experiments.

Python Pandas EDA
Open on GitHub

Repository

Case-Analista-Dados

Adventure Works executive dashboard with forecasting, What-If analysis, and Brazil-specific storyline.

Power BI Forecast Business
Open on GitHub

Repository

Case-Engenheiro-dados

DataOps challenge with Python + MongoDB pipelines, Dockerized environments, and full operational docs.

DataOps MongoDB Docker
Open on GitHub

Repository

csv-insights-tool

Browser-based CSV analyzer that produces instant statistics and D3.js charts for huge files.

JavaScript D3.js Data Viz
Open on GitHub

Repository

dash-desmatamento-fogo-chuva

Plotly Dash dashboard that cross-analyzes deforestation, fire spots, and rainfall across Brazilian biomes.

Dash APIs Sustainability
Open on GitHub

Repository

datascients-guide

Open guide to build a compelling data science portfolio with checklists, templates, and curated references.

Mentoring Documentation Python
Open on GitHub

Repository

encceja-pandemia-impact-analysis

Pandemic impact analysis on ENCCEJA results combining public datasets and notebook storytelling.

Pandas Education Storytelling
Open on GitHub

Repository

kpis-governance-dashboard

End-to-end data governance case with strategic KPIs, architecture diagrams, and BI deliverables.

Data Governance dbt Power BI
Open on GitHub

Repository

machine-learning-alg

Visual playground that explains ML algorithms with interactive demos and intuitive charts.

JavaScript Education ML
Open on GitHub

Repository

network-troubleshooting

Interactive guide that walks home users through network diagnostics via dynamic Q&A flows.

UX Web App Networking
Open on GitHub

Repository

promptsections

Streamlit app that breaks long Stable Diffusion/ComfyUI prompts into reusable sections for faster iteration.

Streamlit Stable Diffusion Generative AI
Open on GitHub

Repository

resume-otimizator

Engine behind the AI Resume Optimizer that compares job posts vs. CVs and powers the Chrome extension.

NLP Career Chrome Ext
Open on GitHub

Repository

whatsapp-rpg-gm

Automated Dungeon Master for WhatsApp that manages sheets, dice rolls, and full D&D storytelling with AI.

Python Bots RPG
Open on GitHub

Repository

windows-event-analyzer

Web tool that interprets Windows Event Viewer logs, flags critical signals with AI, and speeds up troubleshooting.

Observability Gemini AI Logs
Open on GitHub

Let's Talk?

Get in touch for discussions about projects and opportunities

LinkedIn

Connect with me

alex-des-santos

GitHub

See my projects

alex-des-santos

Email

Send me a message

eu@alexandre.pro