Working with HCES Nesstar files using nesstarR • mospiR

NSS rounds 64, 66, and 68 come as .Nesstar binaries packed inside .rar archives. This vignette covers downloading one round with mospiR, extracting the archive, and reading the data with nesstarR.

Prerequisites

remotes::install_github("saketkc/mospiR")
remotes::install_github("saketkc/nesstarR")

library(mospiR)
library(nesstarR)

Download from the portal

api_key <- Sys.getenv("MOSPI_KEY")

download_dataset(
  "DDI-IND-NSSO-66-SCHEDULE-1.0T2",
  file.path("data", "hces", "DDI-IND-NSSO-66-SCHEDULE-1.0T2"),
  api_key
)

The download is a .rar archive, about 54 MB for NSS 66 T2.

Extract the archive

unar (macOS/Linux) and unrar (Windows) both work. Either must be on your PATH.

rar_file <- file.path(
  "data", "hces", "DDI-IND-NSSO-66-SCHEDULE-1.0T2",
  "Nss66_1.0-type2_new format.rar"
)
system2("unar", c("-o", dirname(rar_file), shQuote(rar_file)))

Extraction produces a folder with the .Nesstar binary, ddi.xml (variable metadata), and supporting documents.

Parse the Nesstar file

nesstar_parse() reads the binary header without loading any row data. Data loads only when you call nesstar_read_dataset() on a specific dataset number.

nb <- nesstar_parse(nesstar_path)
nb
#> <nesstar_binary>
#>  File      : nss66_consumer_expenditure_type_2.Nesstar 
#>  Datasets  : 9

Dataset structure

A single .Nesstar file holds multiple datasets, one per schedule block.

nesstar_datasets(nb)
#>   dataset_number row_count variable_count
#> 1             18    100794             49
#> 2             19    100794             50
#> 3             20    468205             40
#> 4             21   4813463             33
#> 5             22   1217060             30
#> 6             23    365912             29
#> 7             24   2145291             29
#> 8             25   3076552             36
#> 9             26   3173462             29

Dataset 21 is the food block: 4.8 million item-level rows across roughly 100,000 households.

Variable listing

vars <- nesstar_variables(nb, dataset_number = 21)
vars[, c("name", "variable_id", "width_value")]
#>                    name variable_id width_value
#> 1                 HH_ID        1724           9
#> 2           centre_code        1694           3
#> 3     FSU_Serial_number        1695           5
#> 4                 Round        1696           2
#> 5       Schedule_Number        1697           3
#> 6                Sample        1698           1
#> 7                Sector        1699           1
#> 8                 State        1725           2
#> 9                Region        1700           3
#> 10       State_District        1726           4
#> 11              Stratum        1702           2
#> 12          Sub_Stratum        1703           1
#> 13        Schedule_type        1704           1
#> 14            Sub_Round        1705           1
#> 15           Sub_Sample        1706           1
#> 16       FOD_Sub_Region        1707           4
#> 17         hg_sb_Number        1708           1
#> 18 Second_Stage_Stratum        1709           1
#> 19               HHS_no        1710           2
#> 20                Level        1711           2
#> 21               Filler        1712           2
#> 22            Item_code        1713           3
#> 23          HP_Quantity        1714           8
#> 24             HP_Value        1715           5
#> 25       Total_Quantity        1716           8
#> 26          Total_Value        1717           5
#> 27          Source_Code        1718           1
#> 28             Ok_stamp        1719           1
#> 29                Blank        1720           1
#> 30                  NSS        1721           2
#> 31                  NSC        1722           3
#> 32                  MLT        1723           8
#> 33           Multiplier        1727           8

Key columns in the food block:

Column	Meaning
`HH_ID`	Household identifier
`State`	State code (2-digit)
`State_District`	District code (4-digit: state × 100 + district)
`Item_code`	Food item code (NSS 66 coding)
`Total_Value`	Household monthly expenditure (Rs)
`Multiplier`	Survey weight

Read a dataset

food <- nesstar_read_dataset(nb, dataset_number = 21)
cat("Rows:", nrow(food), "| Columns:", ncol(food), "\n")
#> Rows: 4813463 | Columns: 33
head(food[, c("HH_ID", "State", "State_District",
              "Item_code", "Total_Value", "Multiplier")])
#>       HH_ID State State_District Item_code Total_Value Multiplier
#> 1 844471101    01           0109       101          96    105.925
#> 2 844471101    01           0109       102         200    105.925
#> 3 844471101    01           0109       107         153    105.925
#> 4 844471101    01           0109       108         210    105.925
#> 5 844471101    01           0109       111           4    105.925
#> 6 844471101    01           0109       129         663    105.925

Quick check: cereal spending by sector

Weighted mean monthly expenditure on cereals (item codes 101-128), rural vs. urban:

cereals    <- food[food$Item_code >= 101 & food$Item_code <= 128, ]
hh_cereals <- aggregate(Total_Value ~ HH_ID + Sector + Multiplier,
                        data = cereals, FUN = sum)
rural <- hh_cereals[hh_cereals$Sector == 1, ]
urban <- hh_cereals[hh_cereals$Sector == 2, ]

cat(sprintf(
  "Weighted mean cereal expenditure (Rs/month):\n  Rural: %.0f\n  Urban: %.0f\n",
  weighted.mean(rural$Total_Value, rural$Multiplier, na.rm = TRUE),
  weighted.mean(urban$Total_Value, urban$Multiplier, na.rm = TRUE)
))
#> Weighted mean cereal expenditure (Rs/month):
#>   Rural: 682
#>   Urban: 713

Export to CSV

nesstar_export() writes one CSV per dataset to output_dir.

output_dir <- file.path(tempdir(), "nss66t2")
nesstar_export(nb, output_dir = output_dir, compress = FALSE)
#> Wrote: nss66_consumer_expenditure_type_2_ds18.csv (100794 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds19.csv (100794 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds20.csv (468205 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds21.csv (4813463 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds22.csv (1217060 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds23.csv (365912 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds24.csv (2145291 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds25.csv (3076552 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds26.csv (3173462 rows)
list.files(output_dir)
#> [1] "nss66_consumer_expenditure_type_2_ds18.csv"
#> [2] "nss66_consumer_expenditure_type_2_ds19.csv"
#> [3] "nss66_consumer_expenditure_type_2_ds20.csv"
#> [4] "nss66_consumer_expenditure_type_2_ds21.csv"
#> [5] "nss66_consumer_expenditure_type_2_ds22.csv"
#> [6] "nss66_consumer_expenditure_type_2_ds23.csv"
#> [7] "nss66_consumer_expenditure_type_2_ds24.csv"
#> [8] "nss66_consumer_expenditure_type_2_ds25.csv"
#> [9] "nss66_consumer_expenditure_type_2_ds26.csv"

Pass compress = TRUE for .csv.gz output.

Round reference

Round	idno	Period	Format
NSS 57th	`DDI-IND-MOSPI-NSSO-57Rnd-Sch1.0-2001`	2001	CSV zip
NSS 58th	`DDI-IND-MOSPI-NSSO-58Rnd-Sch1.0-2002`	2002	CSV zip
NSS 59th	`DDI-IND-MOSPI-NSSO-59Rnd-Sch1.0-2003`	2003	CSV zip
NSS 60th	`DDI-IND-MOSPI-NSSO-60Rnd-Sch1-Jan-June2004`	2004	CSV zip
NSS 61st	`DDI-IND-MOSPI-NSSO-61Rnd-Sch1-July2004-June2005`	2004-05	CSV zip
NSS 62nd	`DDI-IND-MOSPI-NSSO-62Rnd-Sch1.0-2005-06`	2005-06	CSV zip
NSS 63rd	`DDI-IND-MOSPI-NSSO-63Rnd-Sch1.0-2006-07`	2006-07	CSV zip
NSS 64th	`IND-NSSO-HCES-2007-v1`	2007-08	Nesstar
NSS 66th T1	`DDI-IND-NSSO-66-SCHEDULE-1.0T1`	2009-10	Nesstar
NSS 66th T2	`DDI-IND-NSSO-66-SCHEDULE-1.0T2`	2009-10	Nesstar
NSS 68th T1	`DDI-IND-MOSPI-NSSO-68Rnd-Sch1.0-July2011-June2012`	2011-12	Nesstar
NSS 68th T2	`DDI-IND-MOSPI-NSSO-68Rnd-Sch2.0-July2011-June2012`	2011-12	Nesstar

Rounds 57-63 unzip to CSVs. Rounds 64, 66, and 68 need nesstarR.