September 2024: Top 40 New CRAN Packages

Two hundred thirty new packages made it to CRAN in September, many of them were interesting, and selecting only forty made for some difficult decisions. When there a difficult choice, I opted in favor of the sciences. Here are my picks for the Top 40 new packages in fifteen categories: AI, Archaeology, Biology, Computational Methods, Data, Genomics, Linguistics, Machine Learning, Medicine, Networks, Pharma, Physics, Statistics, Utilities, and Visualization.

AI

groqR v0.0.1: Provides a suite of functions and RStudio Add-ins leveraging the capabilities of open-source Large Language Models (LLMs) to support R developers. Features include text rewriting, translation, and general query capabilities. Programming-focused functions provide assistance with debugging, translating, commenting, documenting, and unit testing code, as well as suggesting variable and function names. Look here for examples.

Archaeology

eratosthenes v0.0.2: Estimates unknown historical or archaeological dates subject to relationships with other dates and absolute constraints, derived as marginal densities from the full joint conditional distribution. Includes rule-based estimation of the production dates of artifact types. See Collins-Elliott (2024) for background and the vignettes on aligning relative sequences and gibbs sampling for archaeology dates.

Biology

pcvr v1.0.0: Provides functions to analyse common types of plant phenotyping data and a simplified interface to longitudinal growth modeling and select Bayesian statistics. See Kruschke (2018), Kruschke (2013) and Kruschke (2021) for background on the Bayesian methods. There are four vignettes including Bellwether workflow and Longitudinal Growth Modeling.

Plot of water use efficiency by genotype

orthGS v0.1.5: Provides tools to analyze and infer orthology and paralogy relationships between glutamine synthetase proteins in seed plants. See the vignettes Searching for Orthologous and Unraveling the Hiden Paralogous.

Chemistry

MethodOpt v1.0.0: Implements a GUI to apply an advanced method optimization algorithm to various sampling and analysis instruments. Functions include generating experimental designs, uploading and viewing data, and performing various analyses to determine the optimal method. See Granger & Mannion (2024) for details and the vignette for examples.

SCFMonitor v0.3.5: Self-Consistent Field(SCF) calculation method is one of the most important steps in the calculation methods of quantum chemistry. See Ehrenreich & Cohen (1959) . This package enables Gaussian quantum chemistry calculation software users to easily read the Gaussian .log files and monitor the SCF convergence and geometry optimization process. Look here for examples.

Computational Methods

XDNUTS v1.2: Implements Hamiltonian Monte Carlo for both continuous and discontinuous posterior distributions with customisable trajectory length termination criterion. See Nishimura et al. (2020) for the original Discontinuous Hamiltonian Monte Carlo; and Hoffman et al. (2014) and Betancourt (2016) possible Hamiltonian Monte Carlo termination criteria. The vignette offers examples.

Data

clintrialx v0.1.0: Provides functions to fetch clinical trial data from sources like [ClinicalTrials.gov](https://clinicaltrials.gov/} and the Clinical Trials Access to Aggregate Content database that supports pagination and bulk downloads. See the vignette.

Histogram of the number of studies in each status category

ColOpenData v0.3.0: Provides tools to download and wrangle Colombian socioeconomic, geospatial,population and climate data from DANE at the National Administrative Department of Statistics and IDEAM at the Institute of Hydrology, Meteorology and Environmental Studies. It solves the problem of Colombian data being issued in different web pages and sources by using functions that allow the user to select the desired database and download it without having to do the exhausting acquisition process. There are six vignettes including How to download climate data and Population Projections.

Map of Columbia showing Internet Coverage.

modgo v1.0.1: Provides functions to generate synthetic data from a real dataset using the combination of rank normal inverse transformation with the calculation of correlation matrix and completely artificial data may be generated through the use of Generalized Lambda Distribution and Generalized Poisson Distribution. See the vignette.

Correlation matrices for simulated and real data

dtmapi v0.0.2: Provides functions to allow humanitarian community, academia, media, government, and non-governmental organizations to utilize the data collected by the Displacement Tracking Matrix, a unit in the International Organization for Migration. See the vignette to get started.

Ecology

douconca v1.2.1: Implements the two step double constrained correspondence analysis (dc-CA) for analyzing multi-trait multi-environment ecological data described inter Braak et al. (2018). This algorithm combines and extends community or sample and species-level analyses.

Plots showing impact of environmental variables

GeoThinneR v1.1.0: Provides efficient geospatial thinning algorithms to reduce the density of coordinate data while maintaining spatial relationships. Implements K-D Tree and brute-force distance-based thinning, as well as grid-based and precision-based thinning methods. See Elseberg et al. (2012) for background and the vignette for examples.

Simulated species data by longitude and lattitude

Genomics

easybio v1.1.0: Provides a toolkit for single-cell annotation with the CellMarker2.0 database and streamlines biological label assignment in single-cell RNA-seq data and facilitates transcriptomic analysis, including preparation of TCGA and GEO datasets, differential expression analysis and visualization of enrichment analysis results. See Wei Cui (2024) for details and the two vignettes bulk RNAsewuence workflow and Single Cell Annotation for examples.

Plots showing criteria for differential expression

GenoPop v0.9.3: Implements tools for efficient processing of large, whole genome genotype data sets in variant call format including several functions to calculate commonly used population genomic metrics and a method for reference panel free genotype imputation. See Gurke & Mayer (2024) for background and the vignette to get started.

SuperCell v1.0: Provides tools to aggregate large single-cell data into metacell dataset by merging together gene expression of very similar cells See the vignettes Example of SuperCell pipeline and SuperCell runs for different samples.

Lingusitics

maxent.ot v1.0.0: Provides tools to fit Maximum Entropy models to phonology data. See Mayer, Tan & Zuraw and the vignette for an overview.

Machine Learning

conversim v0.1.0: Provides tools to analyze and compare conversations using various similarity measures including topic, lexical, semantic, structural, stylistic, sentiment, participant, and timing similarities. Methods are based on established research: For example see Landauer et al. (1998) Jaccard (1912) and Salton & Buckley (1988). Thee are four vignettes including analyzing similarities between two long speaches and analyzing similarities in conversational sequence in one Dyad and across multiple Dyads.

dsld v0.2.2: Provides statistical and graphical tools for detecting and measuring discrimination and bias, be it racial, gender, age or other. Detection and remediation of bias in machine learning algorithms. See the Quick Start Guide.

Medicine

SurvMA v1.6.8: Implements a model averaging approach to predict personalized survival probabilities by using a weighted average of multiple candidate models to approximate the conditional survival function.Two scenarios of candidate models are allowed: the partial linear Cox model and the time-varying coefficient Cox model. See Wang (2023) for details and look here for an example.

wintime v0.2.0: Provides methods to perform an analysis of time-to-event clinical trial data using various methods that calculate and compare treatment effects on ordered composite endpoints. See Troendle et al. (2024) for the details of the methods and the vignette for examples.

Networks

arlclustering v1.0.5: Implements an innovative approach to community detection in social networks using Association Rules Learning providing tools for processing graph and rules objects, generating association rules, and detecting communities based on node interactions. See El-Moussaoui et al. (2021) for details. There are eight vignettes including General Introduction and Testing WordAdjacency dataset.

ggtangle v0.0.2: Extends the ggplot2 plotting system to support network visualization for network associated data. See the vignette.

Pharma

sdtm.oak v0.1.0: Provides a framework to develop CDISC, SDTM datasets in R and potentially automate the process. There are six vignettes including on on Algorithms.

Physics

rice v0.3.0: Provides functions to calibrate radiocarbon dates, different radiocarbon realms (C14 age, F14C, pMC, D14C) and to estimate the effects of contamination or local reservoir offsets. See Reimer and Reimer 2001 and Stuiver and Polach (1977) for background and the vignette for examples.

Plot showing levels of contamination in 14C estimates.

STICr v1.0: Comprises a collection of functions for processing raw data from Stream Temperature, Intermittency, and Conductivity (STIC) loggers. ‘STICr’ (pronounced “sticker”) that includes functions for tidying, calibrating, classifying, and doing quality checks on data from STIC sensors. See Wheeler/Zipper et al. (2023) for background and the vignette for an Introduction.

Statistics

dpasurv v0.1.0: Provides functions to implement dynamic path analysis for survival data via Aalen’s additive hazards model. See Fosen et al., (2006) for details. There is an oveview and a vignette on plotting with ggplot2.

Plot showing mediator variable over time by dose response.

LearnVizLLM v1.0.0: Implements tools to summarize the characteristics of linear fixed models without data or a fitted model by converting code code for fitting nlme::lme() and lme4::lmer() models into tables, equations, and visuals. See the vignette for details.

lnmixsurv v3.1.6: Combines the mixture distributions of Fruhwirth-Schnatter(2006) and the data augmentation techniques of Tanner and Wong (1987) to implement Bayesian Survival models that accommodate different behavior over time and consider higher censored survival times. There are five vignettes including a [Get started guide}

Path.Analysis v0.1: Provides functions for conducting sequential path coefficient analysis and testing direct effects and functions for estimating correlation, drawing correlograms, heatmaps, and path diagrams. See Arminian et al. (2008) for background and the vignette for examples.

Utilities

charcuterie v0.0.4: Creates a new chars class which looks like a string but is actually a vector of individual characters, making strings iterable and enabling vector operations on ‘strings’ such as reverse, sort, head, and set operations. See the vignettes Example Usage and Use Cases.

The Spongebob case: every second letter upper case

dtreg v1.0.0: Provides tools to interact with data type registries and create machine-readable data. See the vinette.

fctutils v0.0.7: Provides a collection of utility functions for manipulating and analyzing factor vectors in R. It offers tools for filtering, splitting, combining, and reordering factor levels based on various criteria. See the vignette.

interface v0.1.2: Provides a run time type system, allowing users to define and implement interfaces, enums, typed data.frame/data.table, as well as typed functions. This package enables stricter type checking and validation, improving code structure, robustness and reliability. There is a vignette and a way to support the author.

pikchrV0.97 : Provides an interface to pikchr q markup language for creating diagrams within technical documentation. See the vignette for examples.

rnix v0.12.4: Provides tools to run the nix package manager. There are fifteen vignettes including a Getting Started Gude.

qs2 v0.1.1: Provides tools to efficiently serialize R objects using one of two compression formats: the qs2 format, which uses R serialization while optimizing compression and disk I/O, and the qdata format which uses custom serialization to achieve slightly faster performance and better compression. qs2 format can be directly converted to the standard RDS. See the vignette

Visualization

ggalign v0.0.4: Implements an extension to ggplot2 that offers various tools for organizing and arranging plots including the ability to consistently align a specific axis across multiple ggplot objects. There are seven vignettes including Examples and Heatmap Layout.

sfcurv v1.0: Implements all possible forms of 2x2 and 3x3 space-filling curves, i.e., the generalized forms of the Hilbert curve, the Peano curve and the Peano curve in the meander type. Look here for examples.

surreal v0.0.1: Implements the Residual (Sur)Realism algorithm described by Stefanski (2007) to generate datasets that reveal hidden images or messages in their residual plots. See README for examples.

survSAKK v1.3.1: Provides functions to incorporate various statistics and layout customization options to enhance the efficiency and adaptability of the Kaplan-Meier plots. See the vignette.