Changelog
Source:NEWS.md
ukbrapR v0.2.8 (05 October 2024)
ukbrapR v0.2.7 (30 September 2024)
Updates
- New function
label_ukb_field()
allows user to add titles and labels to UK Biobank fields provided as integers but are categorical. - New function
label_ukb_fields()
is a wrapper for the above. User just provides a data frame containing UK Biobank fields, and they all get formatted with titles (and labels if categorical). - Data from the UK Biobank schema (https://biobank.ctsu.ox.ac.uk/crystal/schema.cgi) are stored internally in
ukbrapR:::ukb_schema
- {haven} dependency added for labelling
- Exported
baseline_dates.tsv
now also includes the assessment centres for completeness (but keeps the same filename to avoid any issues for current projects relying on already-exported files)
ukbrapR v0.2.6 (16 September 2024)
Bug fix
- Fix for issue #10. Grep issues if user provided only Read2 or CTV3 codes, if Read2 or CTV3 were <5 characters, or if Read2/CTV3 codes contained a hyphen. Thanks to @Simon-Leyss for highlighting.
- Fix for issue #11. When getting self-reported illness codes there was a problem joining the tables if user only provided cancer codes. Thanks to @LauricF for highlighting.
- Fix for when both types self-reported illness codes were provided. (Incorrect subsetting to just those codes provided after pivoting the long object.)
ukbrapR v0.2.4 (05 September 2024)
Changes
- Updated internal paths for my servers
indy
andsnow
(for ongoing projects whilst we can still use local files…) - Updated how
get_diagnoses()
andget_df()
handle a user-providedfile_paths
object
ukbrapR v0.2.1 (10 August 2024)
Bug fix
- Fix for issue #5. The file paths for exported tables were not correctly specified in later calls of
get_diagnoses()
when the working directory is not the home directory. Thanks to @LauricF for highlighting.
ukbrapR v0.2.0 (30 July 2024)
This is a major update as I move away from using Spark as the default environment, mostly due to the cost implications; it is significantly cheaper (and quicker!) to store and search exported raw text files in the RAP persistant storage than do everything in a Spark environment (plus the added benefit that the RStudio interface is available in “normal” instances).
The Spark functions are available as before but all updates are to improve functionality in “normal” instances using RStudio, as we move to the new era of RAP-only UK Biobank analysis.
Changes
- Added internal data frame containing default paths for exported files in a RAP project (view with
ukbrapR:::ukbrapr_paths
) - Added function
export_tables()
which only needs to be run once when a new project is created. This submits the required table exporter commands to extract each of the tables inukbrapR:::ukbrapr_paths
. This can take ~15 minutes to export all the tables. ~10Gb of text files are created. This will cost ~£0.15 per month to store in the RAP standard storage. -
get_emr()
is split into two primary underlying functions:get_emr_spark()
which has not changed, andget_emr()
which is the “new way” (i.e.,get_emr_local()
is entirely removed) - Added functionality for
hesin_oper
(HES OPCS operations) searching for ICD10 codes inget_emr()
- New/updated internal functions
get_cancer_registry()
asceratains cases using ICD10s in thecancer_registry
data, and works much the same asget_selfrep_illness()
- New function
get_diagnoses()
is a wrapper to get HES diagnosis, operations, cause of death, GP, cancer registry, and self-reported illness data – i.e., once function to provide all codes to, and return all health-related data -
get_df()
takes all output fromget_diagnoses()
i.e., now also identifies date of first in matchedcancer_registry
andhesin_oper
entries, in addition tohes_diag
,gp_clinical
,death_cause
andselfrep_illness
as before. - When getting “date first” using
get_df()
the baseline data is used to create binary case/control variables (for ever and prevalent), and for controls the censoring date is included in the overall_df
variable (default is 30-10-2022).
To make it absolutely clear: the Spark function get_emr_spark()
has not been updated but I am no longer focussed on doing things this way. If you want to submit Pull Requests to improve functions please do. The below changes are to substantially improve the experience of using exported tables in the RAP environment only (if you have all the data on a local system already it will work, assuming you format correctly and provide the paths, but the RAP is the future).
ukbrapR v0.1.7 (28 July 2024)
Bug fixes
- Fix Spark database error when >1 dataset file is available. Fixes issue #3
ukbrapR v0.1.6 (03 July 2024)
Bug fixes
- Fix
get_df()
error when ascertaining GP diagnoses if 7-character codes were provided rather than 5
ukbrapR v0.1.5 (01 July 2024)
Bug fixes
- Fix
get_df()
error occurring when not all sources are desired
ukbrapR v0.1.3 (8 June 2024)
New feature
- It is quicker/easier to ascertain multiple conditions at once to supply
get_emr()
with all the codes (as before), but now can useget_df()
with option “group_by” to indicate the condition names in thecodes_df
object provided. See documentation.
ukbrapR v0.1.2 (6 June 2024)
New features
- New function
get_emr_local()
. If the user has text files forhesin_diag
andgp_clinical
etc. these can be searched (rather than Apache Spark queries). This therefore can work on “normal” DNAnexus nodes, or local servers. Most downstream functions also do not rely on Spark clusters if data extracts are available.
Changes
- Change URL to reflect my GitHub username change from
lukepilling
tolcpilling
to be more consistent between different logins, websites, and social media – https://lcpilling.github.io/ukbrapR – https://github.com/lcpilling/ukbrapR - Added dependency {cli} for improved alert/error reporting
ukbrapR v0.1.1 (6 March 2024)
New features
- New argument “prefix” for
get_df()
- user can provide a string to prefix to the output variable names
ukbrapR v0.1.0 (21 Feb 2024)
New features
-
get_selfrep_illness()
- gets illness information from self-report fields. Derives a “date first” from the age/year reported, incorporating all visits for the participant - Two example code lists are incuded:
codes_df_ckd
(GEMINI CKD), andcodes_df_hh
(haemochromatosis, with self-report)
Changes
-
get_emr_df()
is re-namedget_df()
to reflect it can now include information from self-reported illness -
get_emr_diagnoses()
is re-namedget_emr()
to reflect it actually retrieves any record ingp_clinical
not just diagnoses (e.g., BMI if appropriate codes provided)
ukbrapR v0.0.2 (14 Nov 2023)
New features
-
get_emr_diagnoses()
- function to get electronic medical records diagnoses from Spark-based death records, hospital episode statistics, and primary care (GP) databases. -
get_emr_df()
- function to get date first diagnosed with any provided code from any above Electronic Medical Record source.
Bug fixes
- Extra input checking in
get_rap_phenos()
and output more consistent for direct use withget_emr_*()
functions - Updated URL for example CKD clinical codes
ukbrapR v0.0.1 (26 Oct 2023)
Initial release containing two functions: - get_rap_phenos()
- upload_to_rap()