AI-Driven Soccer Formations Analysis: Optimizing Team Formations Through Deep Learning (Part II)

11 min readSep 18, 2024

Delving Deeper into AI’s Role in Football Strategy through Model Optimization and Predictive Accuracy

Summary

This article explores how AI can revolutionize football strategy by predicting and optimizing team formations based on real player data. By analyzing match data — such as player positions, attributes like sprint speed and passing accuracy, and team formations — AI provides coaches with tactical insights that go beyond traditional intuition.

Using advanced machine learning techniques, the article details how AI models can predict winning formations with high accuracy (78.3%) and F1 score (78.2%). Real-world examples, such as Liverpool’s 4–0 win over Barcelona, illustrate how data-driven decisions can enhance team performance.

The article also highlights potential real-time applications of AI in football, offering coaches dynamic formation suggestions based on live match conditions. With further refinements, AI could revolutionize football tactics, making data a critical part of strategy planning.

Introduction: Deep Dive Into The Project And The Real Football.

In football, team formations are the key pillars of match strategy. Traditionally, coaches rely on their experience, intuition, and insights into the opposition. However, AI introduces an exciting new dimension to the planning and execution of team formations.

This second part of our project focuses on using AI to predict and optimize team formations by analyzing player attributes, providing tactical insights that can give teams a competitive edge.

Real-World Examples: How Teams Adjust Formations Based on Player and Opponent Data

A striking example of formation adjustments can be seen in the 2019 Champions League semi-final between Liverpool and Barcelona. Jurgen Klopp opted for a dynamic 4–3–3 formation, exploiting the blistering pace of Sadio Mané while applying intense high-press tactics, ultimately dismantling Barcelona and securing a legendary 4–0 triumph. (Coaches’ Voice | Tactical Analysis: Liverpool 4 Barcelona 0 (coachesvoice.com).

Similarly, Pep Guardiola’s Manchester City often adjusts their formation to break down defensively solid teams. Against such opponents, Guardiola has employed a 3–2–4–1 system, using midfield creativity to open up defenses( Pep Guardiola’s Premier League Triumph: Mastering the 3–2–4–1 Formation — Breaking The Lines).

Liverpool Vs. Barcelona Formation 2019 Champions League semi-final 2019

How AI Improves Tactical Formation Decisions

By carefully analyzing each team’s attributes, AI empowers tactical flexibility. It can recommend tailored formations for different match scenarios, giving coaches data-backed insights that go beyond mere instinct, offering a deeper, more strategic level of decision-making.

Data Sources and Context

In this second part of our AI-driven soccer formations analysis, we delve deeper into the data that fuels our model. Unlike the previous article where we introduced the broader scope of AI in football, here we focus on the actual game data that helps predict and optimize formations. Specifically, we’re working with match data that includes player positions, team results, and detailed player attributes.

This section will explain the key elements of the dataset, the methodology for constructing formations from player positions, and how we transformed the data to make it usable for our machine learning models.

Deep Insights into Match Data: Unlocking the Key Metrics Behind Team Performance

All the data of this article, cames from European Soccer Database (kaggle.com) a comprehensive collection of football data from 11 European countries, covering over 25,000 matches and more than 10,000 players from the 2008–2016 seasons.

Match Data

The core of our analysis comes from detailed information about each match. The data we’re using includes:

Season: The year or season in which the match took place. This is crucial because player performance, team tactics, and formations can evolve from season to season.
Match Result: For each match, we capture the goals scored by both the home and away teams. This allows us to determine the winning team, which is essential for our analysis since we focus on the formations of the winning team.
Player IDs (API IDs): Each player who participated in the match is identified by a unique API ID. This allows us to link the player’s in-game position and their season statistics.
Player Positions (Y-axis): The Y-axis position of each player on the pitch provides insight into their role and placement within the team’s tactical formation.

Player Attributes

In addition to knowing where each player is positioned, we have access to detailed player statistics that allow us to evaluate their individual performances. Examples of these attributes include:

Height and Weight: Physical attributes that may influence a player’s role, such as being a central defender or a striker.
Overall Rating and Potential: Metrics that quantify a player’s current abilities and their potential for improvement.
Offensive and Defensive Work Rate: These metrics help determine the level of effort a player contributes to attacking or defending phases of play.
Skills such as Passing, Dribbling, and Reflexes: These detailed statistics paint a picture of the player’s technical abilities, such as crossing accuracy, sprint speed, or goalkeeping reflexes.

Breaking Down Team Structures: How Player Coordinates Define Formations

The next critical step in our process is constructing the team formations. Since we have the X and Y coordinates of each player’s position on the field, we categorize players based on their Y-axis placement. This allows us to reconstruct the team’s tactical formation. Here’s how we define the Y-axis positions:

Y < 4: These are typically the defenders, positioned closest to their own goal.
Y between 4 and 9: These players are generally midfielders, operating in the center of the field.
Y > 9: These are attackers, positioned closest to the opponent’s goal.

By using this categorization, we can determine whether a team played a 4–4–2, 4–3–3, or even a more complex 3–5–2 formation, depending on how the players were distributed along the pitch. This segmentation is crucial for our formation analysis, helping us to visualize and optimize how teams should be structured.

Visualizing Tactical Insights: Team Formations at Play

Below, we illustrate formations drawn from the European Soccer Database, focusing on one of the iconic matches between FC Barcelona and Real Madrid (Barcelona’s 5–0 victory over Real Madrid in the 2010/2011 season). The positions of each player provide insight into how both teams set up tactically during the game:

Formation of the players of European Soccer Database in the 2010/2011 FC Barcelona Vs Real Madrid Match

Formation of the players in the European Soccer Database for the 2010/2011 FC Barcelona vs. Real Madrid Match. The visual representation shows how players were distributed across the pitch, allowing us to see how the winning team (Barcelona) effectively dominated the midfield and attack. These insights can help coaches adjust formations based on player strengths and the opponent’s weaknesses.

Formation distribution for home and away teams in European Soccer Database matches. This chart highlights the imbalance in formations used by home and away teams, extracted from the European Soccer Database. The data shows that certain formations, particularly the 4–4–2 and 4–5–1, are far more frequent, reflecting tactical preferences observed in the dataset.

From Raw Data to Strategic Advantage: Transforming Player Stats into Tactical Gold

For our machine learning models, we needed to prepare the data in a structured way that links player statistics to the formation outcome. Here’s how we transformed the raw match data:

Focus on the Winning Team

First, we filter out all the drawn matches from the dataset. We are only interested in analyzing winning formations to determine what makes them successful. From each match, we extract the formation and statistics of the winning team.

Player Statistics

For each match, we gather detailed statistics for all 11 players on the winning team. These include:

Height and Weight: To understand the physical composition of the team.
Seasonal Attributes: Including overall rating, sprint speed, passing, dribbling, and many more detailed metrics that describe the performance capabilities of each player during that particular season.

Complete Data Set Structure

The final dataset consists of 882 columns, representing the following:

Season: The year or season in which the match was played.
Winning Team Formation: The formation (e.g., 4–4–2) used by the winning team.
Player Statistics: Each player on the winning and losing teams has 40 different statistics that describe their performance capabilities. These include attributes like sprint speed, passing accuracy, goalkeeping abilities, etc.

The full structure looks like this:

Season
Winning Team Formation
Statistics for Winning Player 1: height, weight, sprint speed, etc.
Statistics for Winning Player 2
…
Statistics for Winning Player 11
Statistics for Losing Player 1
…
Statistics for Losing Player 11

By structuring the data this way, we provide the machine learning model with all the necessary information to make accurate predictions about which formations are most successful.

Applying the Model to Real-World Football Strategy

In a real-world setting, we could use this model to make formation decisions based on current player stats. Instead of past match data, we would input our team’s player statistics as the “winning players” and the opponent’s stats as the “losing players.” The model would then predict the most effective formation for our team to succeed.

For example, by analyzing attributes like sprint speed, passing accuracy, and defensive skills, the model suggests which formation (e.g., 4–3–3 or 4–4–2) would maximize our team’s chances of winning against a specific opponent. This allows coaches to use data-driven insights for optimizing strategies, moving beyond intuition and offering tailored tactical advice for each match.

Building the AI Model: Predicting Tactical Formations

To develop our model for predicting winning football formations, we followed a systematic approach:

1. One-Hot Encoding:

We transformed categorical formation data into binary vectors using One-Hot Encoding. This method represents each formation (e.g., 4–4–2) as a separate column with binary values, crucial for our multiclass classification problem.

2. Train/Test Split:

The data was divided into 80% for training (12,772 matches) and 20% for testing (3,193 matches) to assess model performance on unseen data.

3. Scaling with MinMaxScaler:

To normalize the wide range of player statistics, we applied MinMaxScaler, ensuring that each statistic was scaled between 0 and 1.

4. Handling Imbalance:

To address the imbalance of formations in the dataset, we applied oversampling, helping the model learn equally from both frequent and rare formations.

5. Dimensionality Reduction:

We utilized PCA and Boruta to reduce the feature set and focus on the most important variables, improving model efficiency.

6. Neural Network Architecture:

Our neural network consists of multiple dense layers with ELU activation, Batch Normalization, and Dropout to prevent overfitting. The output layer uses softmax to predict the most likely formation.

7. Model Training:

Trained with Adamax optimizer, the model was optimized for performance using Early Stopping and ReduceLROnPlateau, ensuring the best results without overtraining.

The model layers, activations, and optimizers are clearly outlined, giving a detailed overview of how the system processes input data (player statistics) to make formation predictions.

By using this approach, we ensure that our AI-driven formation suggestions are data-backed and can offer coaches valuable insights into optimizing team setups for future matches.

bAIcelona: Neural network architecture used to predict team formations.

Results Breakdown: How AI Shapes Football Formations

AI-Powered Predictions: Unveiling the Tactical Secrets Behind the Data

The results from our team formation optimization model demonstrate promising potential in tactical prediction:

Accuracy: 0.783
F1 Score: 0.782

These metrics indicate that our model is capable of predicting winning formations with a high level of precision. The inclusion of player attributes such as height, weight, and various skill statistics greatly enhances the model’s decision-making power. By capturing the complexity of real-world football dynamics, we can confidently say that the model has the ability to identify key factors that contribute to successful team strategies.

Winning with AI: How Our Model Suggests the Best Formations

Balanced Performance Across Formations: One of the key strengths of the model lies in its ability to handle diverse formations, from the classic 4–3–3 to more unconventional lineups like 3–4–3. The oversampling techniques used for underrepresented formations, coupled with feature selection through PCA and Boruta, ensured a balanced approach across all potential strategies.

Real-World Applications: In real-life football scenarios, such as the examples drawn from the 2008–2016 dataset, the model could provide coaches with tactical suggestions that go beyond traditional intuition. By analyzing player attributes in relation to formations, the model can help anticipate which setups are more likely to yield positive results.

For instance, in matches like Real Madrid vs Barcelona (2011/2012), the model could have suggested modifying Real Madrid’s 4–2–3–1 formation to a more defensive 5–3–2 based on player matchups and attributes. Such recommendations could shift game dynamics, offering real-time adjustments based on data.

From Data to Victory: Practical Applications of AI in Football Strategy

The implementation of artificial intelligence in team formation analysis opens up exciting new avenues for football strategy. This project demonstrates that, by utilizing historical match data, advanced player statistics, and machine learning models, it’s possible to optimize team lineups to increase the likelihood of success.

Our model shows that, even in situations where a team might have traditionally relied on one formation, data-driven insights can suggest alternative tactics that leverage the strengths of specific players or expose the weaknesses of opponents.

These insights could be pivotal for both pre-game strategy planning and in-game decision-making, offering an extra layer of tactical flexibility.

Moving Forward: AI in Football and the Future of Tactical Evolution

Expanding Data Sets: While the current model uses the European Soccer Database (2008–2016), integrating more recent data would enhance its ability to reflect modern football trends. Additionally, including data from various leagues (e.g., Premier League, La Liga) would provide broader insights and better generalization.
Real-Time Application: With further development, this model could be adapted to function in real-time, suggesting in-game tactical adjustments based on player fatigue, ongoing match statistics, and even live player positioning data. Imagine a coach receiving data-driven formation suggestions in the middle of a Champions League final — this could revolutionize the way football decisions are made.
Further Model Refinement: The next logical step would be to refine the model by incorporating more advanced features such as player fitness tracking, weather conditions, and opponent form. This would create an even more robust system capable of making nuanced tactical recommendations.
Machine Learning with Video Analysis: Another future direction could involve combining machine learning models with video analysis of player movements and team shape during matches. This would provide an additional layer of insight, going beyond raw statistics to include real-time player behavior on the pitch.

It has been a pleasure to share this project with you :)

Apllaude if you liked it, and feel free to contact me at guillemmiralles1@gmail.com with any questions you may have.