News

Step-by-Step Guide to Building a Project Setup File

20 min read
on a laptop with code projected in front

In the journey of software development, there inevitably arrives a moment when you wish to share the fruits of your labor with others. This desire isn’t limited to external users; often, there’s a personal need to reuse your own previously developed code on a fresh machine or within a distinct virtual environment. We’ve already delved into the critical nature of having well-structured packages in Python for seamless importing, underscoring the importance of Python’s ability to accurately locate these packages.  Exploring the NI DAQ Python integration involves understanding the intricacies of the setup.py example, providing a comprehensive guide to seamlessly incorporate National Instruments’ Data Acquisition functionalities into Python projects.

The role of a properly configured setup file becomes paramount in this context. With such a file in place, Python gains the clarity it needs to quickly identify the location of the package. This clarity not only streamlines the process of importing modules from different sources but also paves the way for the effortless distribution of your code to a broader audience, thereby reducing the obstacles to its wider adoption. The creation and implementation of a setup.py file is a crucial step in evolving from simple scripting to establishing a robust, package-based framework that can be reliably utilized in various scenarios.

Creating Your Initial Setup File

In this section, basic illustrative examples, similar to those discussed in Python imports, will be explored. These examples, while straightforward, effectively demonstrate the concept. Begin in a new, empty directory—its name is inconsequential, but maintaining isolation from other files is crucial for a seamless learning experience as you advance through this guide.

The first step involves crafting a file named first.py and populating it with the specified code:

def first_function():
    print('This is the first function')

This file is intended for use in other packages and beyond. Our goal is to execute code like the following:

from first import first_function

first_function()
# 'This is the first function'

Beside the file, let’s generate a new file named setup.py, containing the following code:

from setuptools import setup

setup(
    name='My First Setup File',
    version='1.0',
    scripts=['first.py'],
)

Caution: While the code under development is benign, it’s advisable to establish a virtual environment for experimentation. You can discard it once you’ve comprehended and refined the details.

To install your script, follow these straightforward steps directly from the console:

python setup.py install

You’ll observe some information displayed on the screen. Most importantly, you can now utilize your package from any location on your computer. For instance, navigate to a different directory on your computer and execute the following:

from first import first_function
first_function()
# This is the first function

This is excellent; now you have the capability to utilize your program from any location, including other packages.

Organizing Your Project Structure

In the previous example, the focus was on a single script, maintaining a relatively simple program structure. Nevertheless, real-world situations frequently entail multiple files, demanding a more intricate organization. To address this, the project structure should be configured to distinctly convey the required files to the setup. Expanding on the initial example, we will create a package encompassing two modules. The resulting folder structure is outlined below:

.

├── my_package

│   ├── __init__.py

│   ├── mod_a

│   │   ├── file_a.py

│   │   └── __init__.py

│   └── mod_b

│       ├── file_b.py

│       └── __init__.py

└── setup.py

Let’s create two straightforward functions within file_a and file_b, respectively:

def function_a():
    print('This is function_a')
def function_b():
    print('This is function_b')

The setup.py file needs modification to accommodate the heightened complexity of the program. Fortunately, the developers at setuptools have streamlined the process. All that is required is to include the following:

from setuptools import setup, find_packages

setup(
    name='My First Setup File',
    version='1.0',
    packages=find_packages()
)

The script argument has been eliminated, and the packages argument has been introduced. The find_packages function will automatically search for packages in a specified directory. If no arguments are provided, it will search in the current folder. To observe it in action, you can execute the following within a Python interpreter:

>>> from setuptools import find_packages
>>> find_packages()
['my_package', 'my_package.mod_b', 'my_package.mod_a']
>>> find_packages(where='my_package')
['mod_b', 'mod_a']

Now, as you can observe, the find_packages function produces a list containing the names of the packages to be included in the setup. 

  • For more precise control, you can substitute find_packages and explicitly list the packages you want to include in your installation.
  • Additionally, if your code resides in an additional folder, such as ‘src,’ you would need to specify it: find_packages(where=’src’). 
  • Maintaining the code directly in a folder adjacent to the setup file, with a name that unmistakably signifies it as the entire package, is a preferred practice, as demonstrated in the example in this section.

Once more, if you simply install the code: 

python setup.py install

you will be able to utilize it:

>>> from my_package.mod_b.file_b import *
>>> function_b()
This is function_b

With your knowledge of importing files in Python, using relative, absolute imports, etc., has become straightforward, as everything is added to the path during the installation process.

Setting Up Development Mode Installation

A key aspect to highlight is that the setup process is beneficial not just when releasing your code, but also during the development phase. This can be effectively managed by installing your code in development mode. To accomplish this, all you need to do is execute the setup script with a specific argument:

python setup.py develop

Now, any modifications made to your files will be reflected in your code. For instance, let’s make changes to file_b.py:

def function_b():
    print('This is New function_b')

Note: Bear in mind that if you modify a module in Python, you’ll need to restart the interpreter. Once a module is imported, Python will bypass any subsequent import statements for the same packages.

One of the notable advantages of this approach is that it seamlessly accommodates the addition of new modules to your package. For instance, let’s generate a new module, C, which leverages the existing functionalities of modules A and B, employing two distinct import approaches. Create a folder named mod_c, include a file named file_c.py, and insert the following:

from ..mod_a.file_a import  function_a
from my_package.mod_b.file_b import function_b

def function_c():
    print('This is function c')
    function_a()
    function_b()

Now, you can observe that having a setup.py file makes the utilization of relative and absolute imports straightforward. No more concerns about the system path, Python path, etc.—everything is managed automatically. Having a setup.py, even in the early stages, significantly simplifies your workflow. You no longer need to fret about the code’s location or how to import it.

Understanding Code Location and Installation Modes

You might be curious about the whereabouts and handling of your code during its operation. Assuming you’re using a virtual environment (which is highly recommended), and you know the location of this environment (i.e., the directory where everything is stored), let’s delve deeper. Firstly, navigate to the site-packages folder. In my case, it’s located at:

venv/lib/python3.6/site-packages

Here, ‘venv’ is the name of my virtual environment and ‘python3.6’ denotes the Python version I’m currently using, though yours might differ. Initially, right after setting up a fresh virtual environment, this folder contains only a few items like pip and setuptools. Now, let’s install our package in development mode:

python setup.py develop

Upon doing this and revisiting the site-packages folder, you’ll observe a new file: My-First-Setup-File.egg-link. This file’s name is derived from the ‘name’ argument specified in the setup.py file. Opening this file with a text editor reveals the full path to the directory containing the *my_package folder. This file informs the Python interpreter of the additional paths it needs to function with our module. Although this isn’t an overly complicated process, the ability to execute it through a single command is incredibly efficient.

Conversely, if you install the package using:

python setup.py install

The content within the site-packages folder exhibits notable differences. You’ll encounter a file named My_First_Setup_File-1.0-py3.6.egg. Egg files essentially function as zip files that Python can decompress and utilize as required. To inspect a copy of your code, encompassing all folders and files, alongside certain metadata about your program (like the author, license, etc., which hasn’t been included in your simple setup file yet), you can open this file using any standard zip file opener. It’s important to note that alterations made to your project files won’t automatically propagate to other programs because Python operates with its own copy of the source code. Hence, after implementing changes, rerunning the setup script is necessary:

python setup.py install

This action refreshes the code, allowing you to utilize the updated development. It’s crucial to remember to rerun the setup script, as failing to do so can lead to confusion over unresolved bugs that you thought were fixed. For ongoing development and experimentation, it’s advisable to stick with the setup.py develop approach to avoid these issues.

A person coding with overlaying lines of code

Utilizing pip for Package Installation

To provide a comprehensive overview, it’s crucial to demonstrate that once your setup.py file is ready, you can also install your package using pip with the following command:

pip install .

When executing this, keep in mind two key points. 

  • Firstly, if you previously used the python setup.py method, you need to remove the files created in your site-packages folder, as their presence can cause pip to display a confusing error message;
  • Secondly, the dot (.) following the install command is essential;
  • It signals to pip that you want to install the package located in the current directory;
  • The name following the install command should match the name of the package you wish to install.

A significant advantage of using pip for installation is that it automatically provides an uninstall option. To uninstall your package, use the package’s name in the uninstall command:

pip uninstall My-First-Setup-File

Here, it’s evident that the name assigned to our package differs from how it’s imported in Python. For example, though the package is named My-First-Setup-File, we import it using my_package. Such naming disparities can occasionally cause confusion. A notable example is the PySerial package, installed as PySerial but imported as serial. This incongruence may not pose an issue initially, but complications arise when two packages have different names while defining modules with identical names, as in the case of serial.

Imagine a situation where we rename my_package to serial and subsequently execute:

pip install .
pip install pyserial

In the site-packages directory, you would find that both your code and PySerial are mixed under the serial folder. There is no warning for this name clash, which can lead to serious issues, particularly if the top-level init.py file is of importance. Although this example might seem far-fetched, it’s worth noting that there actually are two different packages named pyserial and serial, each used for distinct purposes but defining the same module name. This type of situation highlights the importance of carefully considering package and module names to avoid conflicts and potential functionality issues.

Implementing Package Installation from GitHub

The creation of setup.py files unlocks the ability to install packages directly from GitHub or other online repositories. For instance, to install the code from this tutorial available on GitHub, you can execute the following command:

pip install git+https://github.com/PFTL/website_example_code.git#subdirectory=code/38_creating_setup_py

There are several important points to note about this command. 

  • Firstly, on Windows systems, the repository address should be enclosed in quotes to be correctly interpreted as a path;
  • The use of git+https in the command is the most straightforward method, but if you have your SSH keys set up, you can also use pip install git+git for accessing private repositories;
  • This technique isn’t limited to GitHub; it’s applicable to other version control platforms like GitLab and Bitbucket as well;
  •  The part up to website_example_code.git specifies the repository location;
  • In this example, the setup file is located in a subdirectory, hence the addition of #subdirectory=code/38_creating_setup_py. If the setup.py file is at the top level of the repository, this additional subdirectory information is unnecessary.

It’s important to note that in most repositories, the setup.py file is typically found at the top level, so specifying a subdirectory is often not required.

Sharing code becomes straightforward with this approach. However, when distributing code, you should consider additional factors. For example, if you update your code on GitHub and then attempt to reinstall it using the same pip command, pip might report that the requirement is already satisfied. This response occurs because pip detects the package is already installed. To update the package, you need to add the –upgrade argument to the pip command, which will then fetch and install the updated code.

Concluding our discussion on pip installations, there’s a valuable feature for installing packages in an editable mode. This mode is particularly useful if you intend to contribute to a package while it’s part of your project’s requirements. You can install a package in this mode using a command like:

pip install -e git+git://github.com/PFTL/website_example_code.git#egg=my_test_package-1.0\&subdirectory=code/38_creating_setup_py

In your virtual environment, you’ll find a ‘src’ folder where the package resides. Interestingly, this package is also a git repository, allowing you to manage it like any other repository. Additionally, a file within the site-packages directory links to the code’s directory, including its subdirectory, thus integrating the package seamlessly into your environment.

Enhancing Functionality with Entry Points

So far, everything has been functioning smoothly, and you likely have numerous ideas brewing about the possibilities with a setup file. However, what has been explored thus far is just the tip of the iceberg. In the examples provided, a package was consistently installed, allowing importation from other packages—an ideal scenario for many applications. Nevertheless, in various situations, you might prefer the ability to run your program directly, eliminating the need to import from another script. This is where entry points come into play as invaluable tools.

Let’s create a new file, named start.py, at the top directory, my_package, and incorporate the following:

from .mod_c.file_c import function_c


def main():
    function_c()
    print('This is main')

This file, on its own, doesn’t perform any actions, but running the main function will generate an output. Returning to the setup.py file, include the following:

from setuptools import setup, find_packages

setup(
    name='My First Setup File',
    version='1.1',
    packages=find_packages(),
    entry_points={
        'console_scripts': [
            'my_start=my_package.start:main',
        ]
    }
)

In the setup file, a new element has been introduced: the entry_points argument. This addition is designed to create a console script, a command executable directly from the terminal. The command is named my_start, and its path is specified using the format module.file:function. The target of this path needs to be a callable, an executable function. After re-installing the package with the command:

from setuptools import setup, find_packages

setup(
    name='My First Setup File',
    version='1.1',
    packages=find_packages(),
    entry_points={
        'console_scripts': [
            'my_start=my_package.start:main',
        ]
    }
)

You’ll find the script inside the bin folder of your virtual environment. Notably, on Windows, it generates an executable (.exe) file. The function used as an entry point can be expanded to include arguments and additional functionality, but that’s a topic for another time.

If your application features a user interface, you might opt for gui_scripts instead of console_scripts. This choice is especially pertinent on Windows, where it enables the program to run without launching a terminal in the background, offering a more polished, professional appearance. However, this is not an essential feature.

It’s important to remember that you can define multiple entry points. If your program handles diverse tasks, you might prefer separate entry points over a single command with a lengthy list of behavior-modifying arguments. The decision on how to structure this depends on what you feel is the most user-friendly approach.

For completeness, there’s one more aspect to address. Currently, our setup with an entry point allows for running a specific task via a script, but it doesn’t enable direct execution of our package from Python. To facilitate this, let’s create a new file in the top-level directory of our package, named main.py, with the following content:

from .mod_c.file_c import function_c


def main():
    function_c()
    print('This is main')


if __name__ == '__main__':
    main()

It mirrors our start file but includes two additional lines at the end. Consequently, it can execute the package directly:

python -m my_package

When examining other packages, it’s common to find the definition of a main file along with an entry point utilizing it. We can redefine our entry point as follows:

entry_points={
    'console_scripts': [
        'my_start=my_package.__main__:main',
    ]
}

Incorporating Dependencies into Your Package

At present, our package operates independently, without relying on external libraries. However, it’s often necessary to include dependencies in your project. To illustrate, let’s say we need our package to work with numpy (any version) and Django, but only versions earlier than 3.0. To achieve this, we modify the setup.py file to include these dependencies:

install_requires=[
    'numpy',
    'django<3.0'
]

When you run the setup script with these specifications, Python will automatically download the most recent version of numpy and opt for Django 2.2, despite the availability of Django 3.0. Managing dependencies can be quite challenging, especially since pip, the package manager we’re using, isn’t particularly adept at resolving version conflicts. For instance, if you have two packages each requiring different versions of a library, the version installed last will override the previous one. This can lead to complications, especially considering that these libraries might have their own set of dependencies.

It’s worth noting that there are more robust package managers available that are better equipped to handle complex dependency scenarios. These managers work by analyzing all dependencies and subdependencies, striving to find an optimal configuration that fulfills all the requirements of the various packages in your project. This approach can significantly simplify the process of managing a project with multiple and potentially conflicting dependencies.

Understanding the Role of Requirements Files vs. Setup Files

When beginning to work with setup files in Python projects, a common question arises about the purpose of requirements files, especially when it seems that everything can be specified within the setup file. The distinction lies in their typical usage and intent.

Requirements files are primarily used for documenting the exact environment in which a project was developed. They often list specific versions of libraries, ensuring that others can recreate a similar development environment. This precision is crucial for consistency and reproducibility in collaborative projects or when deploying applications.

On the other hand, setup files are generally more flexible. They are designed to be user-friendly, facilitating the installation and running of your code with minimal hassle. In a setup file, you’d typically specify the minimum or maximum versions of libraries needed for your code to function correctly. This approach accounts for future library updates that might otherwise break your code.

However, it’s important to remember that there are no strict rules governing the use of these files. The decision boils down to considering what would be most convenient for your users. Would specifying exact library versions (as is common in requirements files) be more helpful, or would allowing some flexibility (as seen in setup files) be better?

For developers who wish to dive into your code, a common practice is to first download the code, install all dependencies from a requirements.txt file, and then run the setup.py file. If configured correctly, the setup process shouldn’t need to download anything new, as all dependencies would already be in the environment. This ensures that both the original developer and the new user have identical versions of dependencies during development.

An alternative, though less user-friendly, is to omit requirements from your setup file and instruct users to install everything from a requirements file before using your library. While this method works, it can be inconvenient for users.

Enriching Functionality with Extra Dependencies

Setup files provide us with the flexibility to specify extra dependencies that may not always be essential but can enhance the functionality of our program in specific scenarios. These dependencies are easily defined as an extra argument for the setup function:

from setuptools import setup, find_packages

setup(
    name='My First Setup File',
    version='1.1',
    packages=find_packages(),
    entry_points={
        'console_scripts': [
            'my_start=my_package.__main__:main',
        ]
    },
    install_requires=[
        'numpy>1.18',
        'django<3.0'
    ],
    extras_require={
        'opt1': ['serial'],
        'opt2': ['pyserial'],
    }
)

To install them, execute a pip command:

pip install .[opt1]
or

pip install .[opt2]

It’s important to note that this approach also works with libraries available on PyPI. For example, scikit-learn can be installed with extra dependencies:

pip install scikit-learn[alldeps]

Extra dependencies prove to be advantageous when dealing with dependencies that are challenging to install on certain systems, consume significant resources, or present conflicts, similar to the scenario with serial and pyserial mentioned earlier. However, it’s essential to be aware that installing both opt1 and opt2 is still possible, potentially leading to conflicts.

Cartoon of people interacting with code on large screens

Exploring Further After Your First setup.py File

This tutorial served as an introductory guide to developing your first setup.py file. While we’ve covered essential aspects, the topic of setup files in Python is vast, offering much more to explore. With the knowledge you’ve gained so far, you’re well-equipped to advance in your Python development journey.

  • One highly beneficial practice is to examine the setup files of well-established libraries, such as scikit-learn or Django. Although their setup files might be more complex than what you currently need, they offer valuable insights into structuring your code and can serve as inspiration. For more relatable examples, you might explore the setup files I’ve created for the Python for the Lab workshop or for the startup I’m involved with. This tutorial didn’t delve into some of the more straightforward yet crucial aspects of setup.py, such as author and website details, as they are relatively straightforward to implement and don’t significantly impact the core discussion;
  • A logical next step, once you have a functional setup file, is to consider distributing your code via the Python Package Index (PyPI). We’ll delve into this process in an upcoming tutorial. Another intriguing aspect, which you might have noticed, involves the creation of folders like build and dist when you run python setup.py install. I encourage you to explore these folders to gain a deeper understanding of what happens during the package installation process;
  • Lastly, an interesting topic for future exploration is the distribution of your package through Conda instead of pip. Conda is a powerful package manager that can handle not only Python libraries but also non-Python dependencies more effectively than pip.  However, making your package available for installation via Conda involves additional steps, which will be covered in subsequent discussions. Stay tuned for more insights into the world of Python package development and distribution.

Conclusion

Creating a setup file is a fundamental step in professionalizing your Python project. It not only facilitates the distribution and installation of your project but also defines how it interacts with the Python ecosystem. By following these steps, you can create a basic setup file that can be expanded as your project grows and evolves.