News

Create setup.py: A Step-by-Step Tutorial

12 min read
Setuptools logo

In the realm of software development, there often comes a time when a developer’s creation must be shared or reused. This necessity highlights the importance of having well-structured packages and ensuring Python recognizes their locations. A critical tool in this endeavor is the `setup.py` file, which serves as a gateway to transform scripts into dependable packages.

Your First `setup.py` File: A Beginner’s Guide

Beginning the venture into package development, it’s paramount to start in a clean, uncluttered directory. This approach prevents any potential confusion or conflicts with existing files. The first step in this journey involves creating a fundamental file, aptly named `first.py`. This file is not just a starting point but a cornerstone, as it contains a basic function that symbolizes the initial offering of the package. Its simplicity belies its importance, as this file is destined for utilization in other packages, serving as a foundational block for more complex operations.

Following the creation of `first.py`, the next crucial step is the formulation of a `setup.py` file. This file is more than just a script; it’s the blueprint for the package’s future. It includes basic settings that are instrumental in defining how the package will be structured, distributed, and used. The `setup.py` file is essentially the package’s identity card, containing essential information like the package name, version, and dependencies. This file is the gateway through which the package interacts with the Python ecosystem, enabling its integration and distribution.

This process of starting with a basic function and gradually building up to a complete setup file exemplifies the iterative nature of software development. It highlights the importance of laying a solid foundation, with each element meticulously crafted to ensure the package’s functionality and usability. As the journey progresses, these initial steps prove crucial in shaping the package’s trajectory, ensuring it is well-prepared for the challenges and complexities that lie ahead in the world of software distribution and reuse.

Cautionary Note: Safe Development Practices

Despite the innocuous nature of this code, it is prudent to use a virtual environment for experimentation. This practice ensures a safe and controlled development space.

Expanding Your Project Structure

As software projects evolve from a single script to a more elaborate architecture encompassing multiple files and modules, the complexity of managing these components increases. This transition necessitates an updated approach in the `setup.py` file, particularly harnessing the capabilities of `find_packages` from setuptools. This function plays a pivotal role in automatically identifying and incorporating the necessary files and modules into the package. By doing so, it streamlines the packaging process, ensuring that all components of the project are neatly bundled and accessible.

This enhanced setup structure facilitates the use of both relative and absolute imports in the project. Relative imports, referring to modules within the same package, become straightforward, as the package’s internal structure is clearly defined and recognized by Python. Similarly, absolute imports, which involve importing modules from outside the package, are also simplified. The clear definition of the package boundary and its contents makes it easier to reference external modules without confusion or conflict.

The use of `find_packages` thus brings a level of organization and clarity that is crucial for larger projects. It not only aids in maintaining a clean and manageable codebase but also significantly eases the development process. Developers can focus more on writing effective code rather than being bogged down by the intricacies of module management and import logistics. This transition to a more complex structure, supported by an efficient setup, marks a significant step in the maturation of a software project, paving the way for scalable and maintainable software development.

Embracing Development Mode

A crucial element of the setup process is its invaluable contribution during the development phase, particularly when the code is installed in development mode. This mode, often used by developers, is instrumental in reflecting changes in the codebase in real-time. As a developer modifies the code, these changes are immediately mirrored in the package’s behavior, offering a dynamic and responsive development environment. This immediate feedback loop is critical for rapid development and debugging, allowing developers to see the impact of their changes without the need for repetitive reinstallation.

However, this convenience of development mode comes with a necessary mindfulness regarding the Python interpreter. For the changes to be effectively recognized and applied, it’s essential to restart the interpreter. This step is crucial because Python loads modules into memory upon initial import, and subsequent changes in the code are not automatically reloaded. Restarting the interpreter ensures that the latest version of the code is loaded and executed. 

This requirement highlights an important balance in software development: the trade-off between flexibility and the small but crucial steps needed to maintain an effective development environment. Developers must be aware of this requirement to fully leverage the advantages of development mode, ensuring their workflow remains efficient and their changes are accurately reflected. In summary, development mode in the setup process is a powerful tool that, when used with the necessary understanding of its nuances, significantly enhances the development experience.

Understanding Code Deployment

A curious mind might wonder about the underlying mechanics of code deployment. Within a virtual environment, the installation of a package, particularly in development mode, leaves traces like `.egg-link` files. These files guide Python to the appropriate paths for module usage. The difference in installation types – development vs. standard – manifests in varied contents in the `site-packages` folder, illustrating the distinct behaviors of each method.

Utilizing `pip` for Installation

The journey of software package development expands significantly with the integration of `pip` for installation. This widely-used tool simplifies the process of installing Python packages from various sources, including the Python Package Index (PyPI). However, its apparent simplicity masks the need for meticulous attention to detail, particularly in managing the `site-packages` folder and adhering to specific naming conventions for packages.

When using `pip`, one critical aspect is the management of existing files within the `site-packages` directory. Prior to installation, it’s essential to clear out any pre-existing files or versions of the package to avoid conflicts or unexpected behavior. This step ensures a clean environment, allowing `pip` to accurately install the latest version of the package without interference from residual files. Additionally, the naming conventions of packages play a crucial role in the `pip` installation process. The name specified in the `setup.py` file should be carefully chosen to reflect the package’s functionality while avoiding conflicts with existing packages. A well-chosen name not only aids in the package’s discoverability but also prevents potential clashes and confusion in the Python ecosystem.

In essence, utilizing `pip` for package installation is a testament to the broader scope of package development, where precision and careful planning are key. This approach, while seemingly straightforward, requires developers to be diligent about the underlying details that contribute to a successful and smooth installation process.

Advanced Installation: GitHub and Beyond

The versatility of `setup.py` files extends to installing packages from online repositories like GitHub. This capability, however, demands attention to details like repository paths and the potential need for specific arguments or configurations, especially when dealing with subdirectories.

The creation of a `setup.py` file is not just a step in package development but a gateway to a world of efficient code sharing, organization, and deployment. It simplifies the import process, bridges the gap between development and distribution, and opens doors to global collaboration.

Enhancing Package Functionality with Entry Points

As the journey of software package development progresses, the realization that the initial steps are just the beginning becomes evident. The focus now shifts towards enabling direct execution of programs, transcending the need for imports from other scripts. This is where the concept of entry points emerges as a pivotal feature.

Implementing Entry Points for Direct Execution

The creation of a new file, `start.py`, in the top directory of the package, `my_package`, marks the first step in this new phase. This file houses a function, `main`, which, while dormant on its own, springs into action when invoked. To facilitate this direct execution, modifications are made to the `setup.py` file. A new argument, `entry_points`, is introduced, specifying `console_scripts` to create a command-line accessible script. This script, named `my_start`, points to the `main` function within `start.py`.

  • Entry Point Installation and Execution. Once the package is reinstalled with these new settings, `my_start` becomes executable directly from the terminal. This command not only runs the function but also exemplifies how the package can be utilized more flexibly. The location of this command varies between operating systems, residing in the `bin` folder for Linux and Mac, and appearing as an `.exe` file in Windows’ `Scripts` folder;
  • Expanding Entry Point Capabilities. The versatility of entry points extends beyond console scripts. For applications with user interfaces, `gui_scripts` can be used, especially on Windows, to run programs without a background terminal, offering a more polished appearance. The flexibility to add multiple entry points caters to diverse functionalities within a single package, allowing developers to streamline user experience.

Bridging Entry Points with Python Direct Execution

In pursuit of consistency and enhanced usability, a new file, `main.py`, is introduced at the package’s top-level directory. This file mirrors `start.py` but includes the `if __name__ == ‘__main__’:` clause, enabling direct execution of the package through Python. This addition aligns with common practices seen in other packages, where a `__main__` file is often used in tandem with an entry point for streamlined functionality.

Through these advancements, the package not only retains its foundational import capabilities but also gains a new dimension of direct executability, showcasing the evolving nature of software package development.

Integrating Dependencies into Your Package

The evolution of the package development journey now leads to the integration of external dependencies. This step involves modifying the `setup.py` file to include necessary third-party packages. For illustration, the package is set to require `numpy` and Django versions earlier than 3.0. Upon installation, Python automatically fetches the appropriate versions of these dependencies, exemplifying the ease of managing external libraries.

Distinguishing Between Setup Files and Requirement Files

A common query arises regarding the role of requirement files when setup files seem to cover dependency specifications. Requirement files typically document the specific versions of libraries used in an environment, aiding in replicating the developer’s setup. In contrast, setup files often lean towards flexibility, specifying minimum or maximum library versions needed for the package’s functionality. The choice between these two approaches depends on the intended user experience—whether to provide exact environment replication or offer flexibility in library versions.

  • The Flexibility of Dependency Management. The world of dependency management in Python packages offers a wide range of strategies. Developers can choose to include all dependencies in the setup file, or they might ask users to install everything from a requirements file. This decision hinges on the developer’s assessment of what best suits their users’ needs, balancing user-friendliness and environment consistency.
  • Introducing Extra Dependencies. Another facet of the `setup.py` file is the ability to define extra dependencies. These are optional libraries that enhance the package’s functionality in certain scenarios. The inclusion of these dependencies adds another layer of sophistication to the package, allowing for more specialized features that cater to specific needs. The installation of these extras is straightforward, usually done via a simple pip command.
  • The Impact of Extra Dependencies. Extra dependencies are particularly useful when dealing with challenging installations, resource-intensive libraries, or potential library conflicts. However, developers must be mindful that installing multiple optional dependencies might still lead to conflicts.
setup.py file

Beyond the Basics: Exploring Advanced Setup.py Features

While this guide serves as an introduction to creating a `setup.py` file, the topic is vast and multifaceted. Developers are encouraged to study the setup files of well-known libraries like scikit-learn or Django for more complex examples and inspiration. It’s also important to note that many additional arguments in the `setup.py` file, such as author and website, have been omitted in this tutorial for brevity.

  • The Path to Package Distribution. An important subsequent step in package development is distribution, potentially through platforms like PyPI. This process involves understanding the structure of generated folders like `build` and `dist` during the installation process;
  • Looking Ahead: Conda Distribution. Finally, another intriguing aspect of package management is the distribution through Conda. Conda excels in handling non-Python libraries and managing dependencies more efficiently than pip. However, making a package Conda-installable involves additional steps, which will be explored in future discussions.

Conclusion: Mastering Package Development with `setup.py`

In summary, the journey of mastering package development with a `setup.py` file is a blend of technical prowess and strategic decision-making. The initial steps of integrating dependencies demonstrate the seamless incorporation of external libraries like `numpy` and Django into a package. This process not only ensures functionality but also provides a glimpse into the nuanced world of dependency management.

Distinguishing between setup files and requirement files is a pivotal learning curve. Setup files offer a broader, more accommodating approach, ideal for ensuring basic functionality. In contrast, requirement files delve into the specifics, mirroring the exact environment of the developer for precise replication. The choice between these two reflects a developer’s consideration for the end-user’s experience and needs.

The concept of extra dependencies introduces an additional layer of complexity and customization. It allows developers to cater to specialized use cases without burdening the primary package with unnecessary or resource-intensive components. This feature underscores the importance of thoughtful design in software development, balancing efficiency with functionality.

As developers progress beyond the basics, they are encouraged to explore the intricacies of more advanced `setup.py` configurations. Observing and learning from established libraries can provide valuable insights and inspiration. Looking forward, the distribution of packages, whether through PyPI or Conda, opens new horizons. Understanding the nuances of each platform, from folder structures during installation to specific requirements for distribution, is crucial for a successful release.

In essence, mastering the art of package development with `setup.py` is a journey of continuous learning, adaptation, and meticulous planning, aiming to create efficient, user-friendly, and robust software solutions.