Package 'sqlm' reference manual

Title:	SQL-Backed Linear Regression
Description:	Fits linear regression models on datasets residing in SQL databases without pulling data into R memory. Computes sufficient statistics inside the database engine via a single aggregation query and solves the normal equations in R.
Authors:	Alejandro Hagan [aut, cre]
Maintainer:	Alejandro Hagan <[email protected]>
License:	MIT + file LICENSE
Version:	1.0.0
Built:	2026-06-01 07:51:09 UTC
Source:	https://github.com/usrbinr/sqlm

Glance at an lm_sql_result

Description

Extract a single-row tibble of model-level summary statistics from a fitted SQL linear model.

Usage

## S3 method for class 'lm_sql_result'
glance(x, ...)
## S3 method for class 'lm_sql_result'
glance(x, ...)

Arguments

x

An 'lm_sql_result' object.

...

Not used.

Details

Returns R-squared, adjusted R-squared, residual standard error, F-statistic and its p-value, model degrees of freedom, log-likelihood, AIC, BIC, number of observations, and residual degrees of freedom.

Value

A single-row tibble with columns 'r.squared', 'adj.r.squared', 'sigma', 'statistic', 'p.value', 'df', 'logLik', 'AIC', 'BIC', 'nobs', and 'df.residual'.

SQL-Backed Linear Regression

Description

Fits a linear regression model using SQL aggregation on a remote database table. The data never leaves the database — only sufficient statistics (sums and cross-products) are returned to R.

Usage

lm_sql(formula, data, tol = 1e-07)
lm_sql(formula, data, tol = 1e-07)

Arguments

formula

A formula object (e.g., price ~ x + cut).

data

A tbl_sql object (from dbplyr).

tol

Tolerance for detecting linear dependency.

Details

The function computes the $X^TX$ and $X^Ty$ matrices entirely inside the database engine via a single SQL aggregation query, then solves the normal equations in R using Cholesky decomposition (falling back to Moore-Penrose pseudoinverse for rank-deficient designs).

Supported formula features:

Numeric and categorical (character/factor) predictors with automatic dummy encoding via 'CASE WHEN'.
Interaction terms ('*' and ':') including numeric × categorical and categorical × categorical cross-products.
Dot expansion ('y ~ .') to all non-response columns.
Transforms: 'I()', 'log()', and 'sqrt()' translated to SQL equivalents ('POWER', 'LN', 'SQRT').
Date and datetime predictors automatically cast to numeric in SQL.
No-intercept models ('y ~ 0 + x').

For grouped data (via [dplyr::group_by()]), a single 'GROUP BY' query is executed and one model per group is returned in a tibble with a 'model' list-column.

NA handling uses listwise deletion: rows with 'NULL' in any model variable are excluded via a 'WHERE ... IS NOT NULL' clause.

Value

An S7 object of class lm_sql_result, or a tibble with a model list-column if the data is grouped.

Convert an lm_sql_result to an orbital object

Description

Creates an orbital object from a fitted SQL linear model, enabling in-database predictions without pulling data into R.

Usage

orbital.lm_sql_result(x, ..., prefix = ".pred")
orbital.lm_sql_result(x, ..., prefix = ".pred")

Arguments

x

An 'lm_sql_result' object.

...

Not used.

prefix

Column name for predictions. Defaults to '".pred"'.

Details

Builds a single prediction expression by combining the fitted coefficients with the R expressions stored in 'term_expressions'. For categorical predictors, the expression includes 'ifelse()' calls that dbplyr translates to SQL 'CASE WHEN'. The resulting 'orbital_class' object can be used with [orbital::predict()] to get predictions or [orbital::augment()] to append a '.pred' column to a database table.

Value

An 'orbital_class' object.

Print an lm_sql_result

Description

Display a concise summary of a fitted SQL linear model.

Usage

## S3 method for class 'lm_sql_result'
print(x, ...)
## S3 method for class 'lm_sql_result'
print(x, ...)

Arguments

x

An 'lm_sql_result' object.

...

Not used.

Details

Prints the original function call and the named coefficient vector.

Value

Invisibly returns 'x'.

Tidy an lm_sql_result

Description

Extract a tidy tibble of per-term coefficient statistics from a fitted SQL linear model.

Usage

## S3 method for class 'lm_sql_result'
tidy(x, conf.int = FALSE, conf.level = 0.95, ...)
## S3 method for class 'lm_sql_result'
tidy(x, conf.int = FALSE, conf.level = 0.95, ...)

Arguments

x

An 'lm_sql_result' object.

conf.int

Logical. If 'TRUE', include confidence interval columns 'conf.low' and 'conf.high'. Defaults to 'FALSE'.

conf.level

Confidence level for the interval. Defaults to '0.95'.

...

Not used.

Details

Returns one row per model term with the estimate, standard error, t-statistic, and p-value. When 'conf.int = TRUE', confidence intervals are computed using the t-distribution with 'df_residual' degrees of freedom.

Value

A tibble with columns 'term', 'estimate', 'std.error', 'statistic', and 'p.value'. If 'conf.int = TRUE', also 'conf.low' and 'conf.high'.

Package 'sqlm'

Help Index

Glance at an lm_sql_result

Description

Usage

Arguments

Details

Value

SQL-Backed Linear Regression

Description

Usage

Arguments

Details

Value

Convert an lm_sql_result to an orbital object

Description

Usage

Arguments

Details

Value

Print an lm_sql_result

Description

Usage

Arguments

Details

Value

Tidy an lm_sql_result

Description

Usage

Arguments

Details

Value