My GSoC Journey - Part 5

Final Submission !

Posted by Jash Shah on July 20, 2022 · 7 mins read

After an adventurous journey throughout these five months, starting with knowing very little about GNU, Python packaging, Docker, Build systems and CI/CD to having a Python package on PyPI, this is a final summary of all my work throughout GSoC 2022!

About pyGnuastro

Topic
Links
GitHub https://github.com/Jash-Shah/pyGnuastro
PyPI https://test.pypi.org/project/pygnuastro/
Documentation <ADD LINK TO DOCS>
Docker Images https://hub.docker.com/repository/registry-1.docker.io/jashshah1401/pygnuastro/tags?page=1&ordering=last_updated


Description

The pyGnuastro python package is a free software which aims to offer a wide array of functions for astronomical data manipulation and analysis. It is the Python implementation of GNU Astronomy Utilities and aims to offer all the functionality of Gnuastro but with a higher level of abstracion.

Need

While Gnuastro’s Library and its Binary programs provides extensive and fine-tuned functionality, it’s tough to deny the versitality and ease of usage that a language like Python brings. Being dynamically typed and high-level makes it simple for novice programmers to interact with Gnuastro’s programs. Additionally, since many of Gnuastro’s use cases require working with data arrays, integrating libraries like NumPy makes data analysis and manipulation more accessible.

Work done throughout GSoC

  1. Learned about Python Extensions, NumPy C-API, setuptools, Gnuastro library and tried out some examples of the same.

  2. Learned about the GNU Build system, mainly Autoconf and Automake.

  3. Started by trying to package the python modules with the Gnuastro itself. The first module tried was cosmology since it doesn’t have any external dependencies and only a few basic functions. Commit.

  4. However, after discussing with my Mentor we came to the conclusion that trying to combine Python and GNU build systems wasn’t the best approach. So we decided to make a separate Python package called pyGnuastro which would be distributed as its own separate Python package and be available on PyPI!

  5. Added a Python Interface(doc (section 12.3.29 pg717) ) module to the Gnuastro Library to provide the utility functions that will be required by the Python wrappers.

  6. Learned about Python Wheels. pyGnuastro depends on the Gnuastro library which in turn has its own dependencies. So, in order to ensure a smooth installation experience for users, we make use of auditwheel(delocate for MacOS) to build manylinux wheels. These wheels include all the dependencies in the distributed package itself rather than depending on the users to install, build and link to them :smile:

  7. Created the structure of the Python package with the setup.py script to build the extensions.

  8. Built a custom docker image using maneage. Worked with my mentor to create a build script for maneage which builds the gnuastro library statically(repo with reproducible instructions) by taking quay.io/pypa/manylinux2014_x86_64 as the base image. This ensures that only one library is included in the pyGnuastro distribution. This reduces the size, increases portability and speeds up the installation of pyGnuastro :stars:

  9. Created a workflow script using GitHub workflows to build wheels for manylinux images, MacOS and source distributions of pyGnuastro.

Future Improvements

pyGnuastro is still in its ideation phase. The major work of figuring out the build system, setting up the docker image and the workflow now allows for a quick process to add more funcionality. The major improvements include:

  • More concise documentation with focus on reproducibility
  • Add more modules, currently we only have two i.e. cosmology and fits
  • Add support for custom data types in Python for better interfacing. For eg: All the list data types in Gnuastro can have their corresponding python versions
  • Add support for more platforms. Create a system similar to the current implementation of the maneage-manylinux2014_x86_64 docker for other manylinux images as well
  • Add integration with other packages like OpenCV, matplotlib, scipy, etc just like NumPy.
  • Better infrastructure and full automated maintainence of the GitHub repo to make it the sole place for all information related to pyGnuastro
  • Add modules for the binary programs for Gnuastro as well just like the library functions. This can be done by either defining library functions which perform similar tasks as the binary program or implement some custom modules to replicate the behaviour of binary programs in Python
  • Perform efficiency comparisons with other implementations once the package reaches a mature state

Acknowledgements

I had a great time working with everyone at Gnuastro. I would like to specially thank my mentor, Mohammad Akhlaghi and my “unofficial” co-mentor Pedram Ashofteh for their continuous support and motivation. The kind of community Mohammad has created with Gnuastro still amazes me. It felt more like being a part of a team than an organization at Gnuastro because of how inclusive, supportive, and collaborative the working environment is. I was always given tasks that were quite clear. All my questions were patiently and clearly addressed, no matter how “silly” they were. There were never any hard-set deadlines imposed upon me, which allowed me to explore the domains involved in my project more freely and made for an overall better experience. I also appreciate the fact that mistakes made by me were returned with concise criticism rather than being reprimanded for them. Through the mentorship, I learned both the technicalities and the philosophies behind writing good code. Even though I’ve not been able to meet all the deliverables mentioned in my initial proposal, I plan to work with Gnuastro to finish my project and also contribute to its codebase in general.