Count Date Ranges per Year — From SQL to SPL #23
Judith-Data-Processing-Hacks

Judith-Data-Processing-Hacks @judith677

About: esProc SPL is a JVM-based programming language designed for structured data computation, serving as both a data analysis tool and an embedded computing engine. https://www.esproc.com/download-esproc

Joined:
Apr 20, 2022

Count Date Ranges per Year — From SQL to SPL #23

Publish Date: Apr 27
8 1

Problem description & analysis:

The x field of the database table example is ID, and the ts field is the time interval.

source table

Task: Now we need to count which years are included in the time interval of each ID, and how many days are included in each year.

expected results

Code comparisons:

SQL

WITH RECURSIVE days as (
  SELECT x, LOWER(ts) as t FROM example 
    UNION ALL
  SELECT x, t+'1 day' FROM days 
  where t < (SELECT UPPER(ts) FROM example where x=days.x)
  )
SELECT x, extract(year from t), count(*)
FROM days
GROUP BY x,extract(year from t)
ORDER BY x,extract(year from t)
Enter fullscreen mode Exit fullscreen mode

Common databases do not have data types related to time intervals, making it difficult to break down data. PostgreSQL has tsrange and daterange types, making the code relatively easy to write. However, it also requires recursive subqueries to generate date sequences, which have complex structures and are not easy to understand.

SPL: SPL can directly generate date sequences 👉🏻:Try.DEMO

SPL code

A1: Load data. […] will be parsed as a sequence.

A2: Generate a date sequence using the ts field of each record, and then expand the members of the sequence to form a new two-dimensional table with the x field of this record. The function ‘periods’ generates a sequence based on the start and end dates, and (1) represents the first member of the sequence.


esProc SPL FREE Download — Free Trial Available, Download Now!

Comments 1 total

Add comment