0%

Docker recommends creating volumes in the default location. However, designating location will be easy for data transferring. Especially, dockers for internal use will not be exposed to the public, and thus security is not a big issue.

A couple options:

Copy into container

1
docker cp /path/of/the/file <Container_ID>:/path/of/he/container/folder

Not a persistent way, because of no volumes involved

Attach a directory as volume

Create

1
docker volume create --name my_test_volume --opt type=none --opt device=/home/../Test_volume --opt o=bind

Mount

1
2
3
4
docker run -d \
--name container_name \
--mount source=my_test_volume,target=/mount_point \
image_name

The two steps are actually modifying the container yaml file and compose it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
version: '3'
services:
nginx:
image: image_name
ports:
- "8081:80"
volumes:
- my_test_volume:/mount_point
volumes:
my_test_volume:
driver: local
driver_opts:
o: bind
type: none
device: /home/../Test_volume
1
docker-compose up -d

scImpute

1
2
3
4
5
6
7
8
9
scimpute(# full path to raw count matrix                                                                     
count_path = '/home/ubuntu/download/cart/AHCA.nonimmu.trachea.csv',
infile = "csv", # format of input file
outfile = "csv", # format of output file
out_dir = "/home/ubuntu/download/cart/AHCA.nonimmu.trachea.scimpute_dropprob0.3", # full path to output directory
labeled = FALSE, # cell type labels not available
drop_thre = 0.3, # threshold set on dropout probability
Kcluster = 2, # 2 cell subpopulations
ncores = 24) # number of cores used in parallel computation

DrImpute

1
2
3
4
5
6
7
8
9
X = read.csv('/home/ubuntu/download/cart/AHCA.nonimmu.trachea.csv')
dim(X)
X.log <- log(X + 1)
X.log <- as.matrix(X.log)
set.seed(1)
X.imp <- DrImpute(X.log, mc.cores=20)
X.imp.count = exp(X.imp) - 1
write.csv(X.imp.count,'/home/ubuntu/download/cart/AHCA.trachea.drimpute.csv')

When VPN is on, my virtual instances in VMware lose network connection regardless of Windows or Linux.

No reasons were identified, and I suspect VPN program has some conflicts with VMware network setting.

New version of R has wrong path of libjvm.so

Error pops up:

1
libjvm.so: cannot open shared object file: No such file or directory

Solution (From https://stackoverflow.com/questions/28462302):

  1. Find your R location. It will be stored in rsession-ld-library-path in rserver.conf file. Or just by doing which R. The location usually is /usr/lib64/R/lib or /usr/lib64/microsoft-r/3.3/lib64/R/lib
  2. Find the libjvm.so file which is usually in the usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server path depending on which jre you’re using. Check in $JAVA_HOME environment.
  3. Create a symlink using ln -s sudo ln -s /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so /usr/lib64/microsoft-r/3.3/lib64/R/lib/libjvm.so
  4. Restart R server

When we apply some methods on tibble, there is an interesting but awkward setting.

For a column of character or String, tibble is not ware of the longest element of the column. Tibble will used the first thousand elements to determine the column width. Then, a problem raises if the longest String occurs after first thousand elements. You will lose several characters.

Flask:

  • Parse and response request.
  • Support API

AngularJS:

  • Content
  • Style

Single Cell technology generates at least billions of data points each time. Explicit reprentation is pretty hard and memory consuming. The most of data points are 0, which can be expression dropout or no expression. Thus, sparse matrix is an appropriate strategy for data storage.

Sparse matrix in R is not a native data type but a matrix-like object supported by package ‘Matrix’.

Be care with using apply, because it will expand this to a matrix first.

Multiprocessing may require load this package

1
foreach( .package = 'Matrix')

Two-factor is required soon in Github

Install ssh

sudo apt install openssh

Generate ssh key

ssh-keygen -t ed25519 -C "**@***.com"

Add key to ssh agent

open ~/.ssh/config

Add the following to config file

Host *
    AddKeysToAgent yes
    UseKeychain yes # depends on if password specified in the last step
    IdentityFile ~/.ssh/id_ed25519
 

Execuate ssd-add

ssh-add -K ~/.ssh/id_ed25519

Add pubkey to Github

Settings -> SSH and GPG keys -> paste pubkey

Get Personal Access Tokens

Once 2factor turned on, password is on longer working.

Obtain PATs

settings -> Developer settings -> personal access tokens

Username does not affect.

Move docker container and volume to another host

Image and container

My understanding about image and container:
Container is a running image;
Image is a stopped container.

move a container

  • export current container (files) into a image
    1
    docker export <CONTAINER ID> > /home/export.tar
  • Transfer to remote host (e.g. sftp)
  • Import image in remote host
    1
    cat /home/export.tar | docker import - some-name:latest

Note: ‘docker export’ exports all files but not command

1
docker run -d -p 8787:8787 -e PASSWORD=yourpasswordhere rocker/rstudio:3.2.0

The command of running genunie Rstudio image does not specifiy a command

When running imported image

1
2
sudo docker run -d -ti  --rm -p 8787:8787 --name RStudioServer --mount source=R_data,target=/home/rstudio/rdata   rstudio:new
docker: Error response from daemon: No command specified.

We need to go back to the original host to find the command:

1
2
sudo docker ps
d79a11d5b7d0 rocker/rstudio:cancer_prediction "/init" 15 minutes ago Up 11 minutes 0.0.0.0:8787->8787/tcp RStudioServer

We can see the command is ‘/init’

1
sudo docker run -d -ti  --rm -p 8787:8787  --name RStudioServer --mount source=R_data,target=/home/rstudio/rdata   rstudio:fromTesla /init

Move a volume

There is no a magic command to move a volume.
My practise is compressing the whole directory and copying into the new container (mount the same entry point).

Numpy vectorization

Vectorization is a technique to accelerate computing.
numpy vectorize v1.20
However, the data type of the output of vectorized is determined by calling the function with the first element of the input.

A thread in stackoverflow descripes this problem. Briefly, each value is evaluated as an object consuming much memory.
How to avoid enormous additional memory consumption when using numpy vectorize?

I encountered this problems when I applied this techinque. It consumes all my memory (128GB).
I tried to expand my swap to 256GB. The disk is not solid state, and thus the swap is pretty slow.