A Simulated RDS Data Set with no seed dependency
This is a faux set used to illustrate how the estimators perform under different populations and RDS schemes.
An rds.data.frame
The population had N=1000 nodes. In this case, the sample size is 500 so that there is a relatively small sample fraction (50%). There is homophily on disease status (R=5) and there is differential activity by disease status whereby the infected nodes have mean degree twice that of the uninfected (w=1.8).
In the sampling, the seeds are chosen randomly from the full population, so there is no dependency induced by seed selection.
Each sample member is given 2 uniquely identified coupons to distribute to other members of the target population in their acquaintance. Further each respondent distributes their coupons completely at random from among those they are connected to.
Here are the results for this data set and the sister fauxsycamore
data set:
Name | City | Type | Mean | RDS I (SH) | RDS II (VH) | SS |
fauxsycamore | Oxford | seed dependency, 70% | 0.2408 | 0.1087 | 0.1372 | 0.1814 |
fauxmadrona | Seattle | no seed dependency, 50% | 0.2592 | 0.1592 | 0.1644 | 0.1941 |
Even with only 50% sample, the VH is substantially biased , and the SS does much better.
The original network is included as
fauxmadrona.network
as a network
object.
The data set
also includes the data.frame
of the RDS data set as
fauxmadrona
.
Use data(package="RDS")
to get a full list
of datasets.
Gile, Krista J., Handcock, Mark S., 2010 Respondent-driven Sampling: An Assessment of Current Methodology, Sociological Methodology, 40, 285-327.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.