NSS rounds 64, 66, and 68 come as .Nesstar binaries
packed inside .rar archives. This vignette covers
downloading one round with mospiR, extracting the archive, and reading
the data with nesstarR.
Prerequisites
remotes::install_github("saketkc/mospiR")
remotes::install_github("saketkc/nesstarR")Download from the portal
api_key <- Sys.getenv("MOSPI_KEY")
download_dataset(
"DDI-IND-NSSO-66-SCHEDULE-1.0T2",
file.path("data", "hces", "DDI-IND-NSSO-66-SCHEDULE-1.0T2"),
api_key
)The download is a .rar archive, about 54 MB for NSS 66
T2.
Extract the archive
unar (macOS/Linux) and unrar (Windows) both
work. Either must be on your PATH.
rar_file <- file.path(
"data", "hces", "DDI-IND-NSSO-66-SCHEDULE-1.0T2",
"Nss66_1.0-type2_new format.rar"
)
system2("unar", c("-o", dirname(rar_file), shQuote(rar_file)))Extraction produces a folder with the .Nesstar binary,
ddi.xml (variable metadata), and supporting documents.
Parse the Nesstar file
nesstar_parse() reads the binary header without loading
any row data. Data loads only when you call
nesstar_read_dataset() on a specific dataset number.
nb <- nesstar_parse(nesstar_path)
nb
#> <nesstar_binary>
#> File : nss66_consumer_expenditure_type_2.Nesstar
#> Datasets : 9Dataset structure
A single .Nesstar file holds multiple datasets, one per
schedule block.
nesstar_datasets(nb)
#> dataset_number row_count variable_count
#> 1 18 100794 49
#> 2 19 100794 50
#> 3 20 468205 40
#> 4 21 4813463 33
#> 5 22 1217060 30
#> 6 23 365912 29
#> 7 24 2145291 29
#> 8 25 3076552 36
#> 9 26 3173462 29Dataset 21 is the food block: 4.8 million item-level rows across roughly 100,000 households.
Variable listing
vars <- nesstar_variables(nb, dataset_number = 21)
vars[, c("name", "variable_id", "width_value")]
#> name variable_id width_value
#> 1 HH_ID 1724 9
#> 2 centre_code 1694 3
#> 3 FSU_Serial_number 1695 5
#> 4 Round 1696 2
#> 5 Schedule_Number 1697 3
#> 6 Sample 1698 1
#> 7 Sector 1699 1
#> 8 State 1725 2
#> 9 Region 1700 3
#> 10 State_District 1726 4
#> 11 Stratum 1702 2
#> 12 Sub_Stratum 1703 1
#> 13 Schedule_type 1704 1
#> 14 Sub_Round 1705 1
#> 15 Sub_Sample 1706 1
#> 16 FOD_Sub_Region 1707 4
#> 17 hg_sb_Number 1708 1
#> 18 Second_Stage_Stratum 1709 1
#> 19 HHS_no 1710 2
#> 20 Level 1711 2
#> 21 Filler 1712 2
#> 22 Item_code 1713 3
#> 23 HP_Quantity 1714 8
#> 24 HP_Value 1715 5
#> 25 Total_Quantity 1716 8
#> 26 Total_Value 1717 5
#> 27 Source_Code 1718 1
#> 28 Ok_stamp 1719 1
#> 29 Blank 1720 1
#> 30 NSS 1721 2
#> 31 NSC 1722 3
#> 32 MLT 1723 8
#> 33 Multiplier 1727 8Key columns in the food block:
| Column | Meaning |
|---|---|
HH_ID |
Household identifier |
State |
State code (2-digit) |
State_District |
District code (4-digit: state × 100 + district) |
Item_code |
Food item code (NSS 66 coding) |
Total_Value |
Household monthly expenditure (Rs) |
Multiplier |
Survey weight |
Read a dataset
food <- nesstar_read_dataset(nb, dataset_number = 21)
cat("Rows:", nrow(food), "| Columns:", ncol(food), "\n")
#> Rows: 4813463 | Columns: 33
head(food[, c("HH_ID", "State", "State_District",
"Item_code", "Total_Value", "Multiplier")])
#> HH_ID State State_District Item_code Total_Value Multiplier
#> 1 844471101 01 0109 101 96 105.925
#> 2 844471101 01 0109 102 200 105.925
#> 3 844471101 01 0109 107 153 105.925
#> 4 844471101 01 0109 108 210 105.925
#> 5 844471101 01 0109 111 4 105.925
#> 6 844471101 01 0109 129 663 105.925Quick check: cereal spending by sector
Weighted mean monthly expenditure on cereals (item codes 101-128), rural vs. urban:
cereals <- food[food$Item_code >= 101 & food$Item_code <= 128, ]
hh_cereals <- aggregate(Total_Value ~ HH_ID + Sector + Multiplier,
data = cereals, FUN = sum)
rural <- hh_cereals[hh_cereals$Sector == 1, ]
urban <- hh_cereals[hh_cereals$Sector == 2, ]
cat(sprintf(
"Weighted mean cereal expenditure (Rs/month):\n Rural: %.0f\n Urban: %.0f\n",
weighted.mean(rural$Total_Value, rural$Multiplier, na.rm = TRUE),
weighted.mean(urban$Total_Value, urban$Multiplier, na.rm = TRUE)
))
#> Weighted mean cereal expenditure (Rs/month):
#> Rural: 682
#> Urban: 713Export to CSV
nesstar_export() writes one CSV per dataset to
output_dir.
output_dir <- file.path(tempdir(), "nss66t2")
nesstar_export(nb, output_dir = output_dir, compress = FALSE)
#> Wrote: nss66_consumer_expenditure_type_2_ds18.csv (100794 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds19.csv (100794 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds20.csv (468205 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds21.csv (4813463 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds22.csv (1217060 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds23.csv (365912 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds24.csv (2145291 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds25.csv (3076552 rows)
#> Wrote: nss66_consumer_expenditure_type_2_ds26.csv (3173462 rows)
list.files(output_dir)
#> [1] "nss66_consumer_expenditure_type_2_ds18.csv"
#> [2] "nss66_consumer_expenditure_type_2_ds19.csv"
#> [3] "nss66_consumer_expenditure_type_2_ds20.csv"
#> [4] "nss66_consumer_expenditure_type_2_ds21.csv"
#> [5] "nss66_consumer_expenditure_type_2_ds22.csv"
#> [6] "nss66_consumer_expenditure_type_2_ds23.csv"
#> [7] "nss66_consumer_expenditure_type_2_ds24.csv"
#> [8] "nss66_consumer_expenditure_type_2_ds25.csv"
#> [9] "nss66_consumer_expenditure_type_2_ds26.csv"Pass compress = TRUE for .csv.gz
output.
Round reference
| Round | idno | Period | Format |
|---|---|---|---|
| NSS 57th | DDI-IND-MOSPI-NSSO-57Rnd-Sch1.0-2001 |
2001 | CSV zip |
| NSS 58th | DDI-IND-MOSPI-NSSO-58Rnd-Sch1.0-2002 |
2002 | CSV zip |
| NSS 59th | DDI-IND-MOSPI-NSSO-59Rnd-Sch1.0-2003 |
2003 | CSV zip |
| NSS 60th | DDI-IND-MOSPI-NSSO-60Rnd-Sch1-Jan-June2004 |
2004 | CSV zip |
| NSS 61st | DDI-IND-MOSPI-NSSO-61Rnd-Sch1-July2004-June2005 |
2004-05 | CSV zip |
| NSS 62nd | DDI-IND-MOSPI-NSSO-62Rnd-Sch1.0-2005-06 |
2005-06 | CSV zip |
| NSS 63rd | DDI-IND-MOSPI-NSSO-63Rnd-Sch1.0-2006-07 |
2006-07 | CSV zip |
| NSS 64th | IND-NSSO-HCES-2007-v1 |
2007-08 | Nesstar |
| NSS 66th T1 | DDI-IND-NSSO-66-SCHEDULE-1.0T1 |
2009-10 | Nesstar |
| NSS 66th T2 | DDI-IND-NSSO-66-SCHEDULE-1.0T2 |
2009-10 | Nesstar |
| NSS 68th T1 | DDI-IND-MOSPI-NSSO-68Rnd-Sch1.0-July2011-June2012 |
2011-12 | Nesstar |
| NSS 68th T2 | DDI-IND-MOSPI-NSSO-68Rnd-Sch2.0-July2011-June2012 |
2011-12 | Nesstar |
Rounds 57-63 unzip to CSVs. Rounds 64, 66, and 68 need nesstarR.