Create docker volume device/host path

Posted on 2021-09-08 Edited on 2022-05-07

Docker recommends creating volumes in the default location. However, designating location will be easy for data transferring. Especially, dockers for internal use will not be exposed to the public, and thus security is not a big issue.

A couple options:

Copy into container

1	docker cp /path/of/the/file <Container_ID>:/path/of/he/container/folder

Not a persistent way, because of no volumes involved

Attach a directory as volume

Create

1	docker volume create --name my_test_volume --opt type=none --opt device=/home/../Test_volume --opt o=bind

Mount

docker run -d \
  --name container_name \
  --mount source=my_test_volume,target=/mount_point \
  image_name

The two steps are actually modifying the container yaml file and compose it.

version: '3'
services:
  nginx:
    image: image_name
    ports:
      - "8081:80"
    volumes:
      - my_test_volume:/mount_point
volumes:
  my_test_volume:
    driver: local
    driver_opts:
       o: bind
       type: none
       device: /home/../Test_volume

1	docker-compose up -d

Single Cell Imputation

Posted on 2021-09-07 Edited on 2022-05-07

scImpute

scimpute(# full path to raw count matrix                                                                     
    count_path = '/home/ubuntu/download/cart/AHCA.nonimmu.trachea.csv',                                   
    infile = "csv",           # format of input file                                                      
    outfile = "csv",          # format of output file                                                     
    out_dir = "/home/ubuntu/download/cart/AHCA.nonimmu.trachea.scimpute_dropprob0.3", # full path to output directory                                                                                           
    labeled = FALSE,          # cell type labels not available                                            
    drop_thre = 0.3,          # threshold set on dropout probability                                      
    Kcluster = 2,             # 2 cell subpopulations                                                     
    ncores = 24)              # number of cores used in parallel computation

DrImpute

X = read.csv('/home/ubuntu/download/cart/AHCA.nonimmu.trachea.csv')
dim(X)
X.log <- log(X + 1)
X.log <- as.matrix(X.log)  
set.seed(1)
X.imp <- DrImpute(X.log, mc.cores=20) 
X.imp.count = exp(X.imp) - 1 
write.csv(X.imp.count,'/home/ubuntu/download/cart/AHCA.trachea.drimpute.csv')

Network issue raises for VPN+VMware

Posted on 2021-08-12 Edited on 2022-05-07

When VPN is on, my virtual instances in VMware lose network connection regardless of Windows or Linux.

No reasons were identified, and I suspect VPN program has some conflicts with VMware network setting.

One rJava problem in R

Posted on 2021-08-07 Edited on 2022-05-07

New version of R has wrong path of libjvm.so

Error pops up:

1	libjvm.so: cannot open shared object file: No such file or directory

Solution (From https://stackoverflow.com/questions/28462302):

Find your R location. It will be stored in rsession-ld-library-path in rserver.conf file. Or just by doing which R. The location usually is /usr/lib64/R/lib or /usr/lib64/microsoft-r/3.3/lib64/R/lib
Find the libjvm.so file which is usually in the usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server path depending on which jre you’re using. Check in $JAVA_HOME environment.
Create a symlink using ln -s sudo ln -s /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so /usr/lib64/microsoft-r/3.3/lib64/R/lib/libjvm.so
Restart R server

Apply in tibble (for String)

Posted on 2021-04-07 Edited on 2022-05-07

When we apply some methods on tibble, there is an interesting but awkward setting.

For a column of character or String, tibble is not ware of the longest element of the column. Tibble will used the first thousand elements to determine the column width. Then, a problem raises if the longest String occurs after first thousand elements. You will lose several characters.

Structure of CARTSC web

Posted on 2021-04-06 Edited on 2022-05-07

Flask:

Parse and response request.
Support API

AngularJS:

Content
Style

Sparse Matrix in Single Cell

Posted on 2021-03-30 Edited on 2022-05-07

Single Cell technology generates at least billions of data points each time. Explicit reprentation is pretty hard and memory consuming. The most of data points are 0, which can be expression dropout or no expression. Thus, sparse matrix is an appropriate strategy for data storage.

Sparse matrix in R is not a native data type but a matrix-like object supported by package ‘Matrix’.

Be care with using apply, because it will expand this to a matrix first.

Multiprocessing may require load this package

1	foreach( .package = 'Matrix')

Steps to after turning on 2factor in Github

Posted on 2021-03-03 Edited on 2022-05-07

Two-factor is required soon in Github

Install ssh

sudo apt install openssh

Generate ssh key

ssh-keygen -t ed25519 -C "**@***.com"

Add key to ssh agent

open ~/.ssh/config

Add the following to config file

Host *
    AddKeysToAgent yes
    UseKeychain yes # depends on if password specified in the last step
    IdentityFile ~/.ssh/id_ed25519

Execuate ssd-add

ssh-add -K ~/.ssh/id_ed25519

Add pubkey to Github

Settings -> SSH and GPG keys -> paste pubkey

Get Personal Access Tokens

Once 2factor turned on, password is on longer working.

Obtain PATs

settings -> Developer settings -> personal access tokens

Username does not affect.

Move docker container and volume to a new host

Posted on 2021-02-26 Edited on 2022-05-07

Move docker container and volume to another host

Image and container

My understanding about image and container:
Container is a running image;
Image is a stopped container.

move a container

export current container (files) into a image

1	docker export <CONTAINER ID> > /home/export.tar

Transfer to remote host (e.g. sftp)

Import image in remote host

1	cat /home/export.tar \| docker import - some-name:latest

Note: ‘docker export’ exports all files but not command

1	docker run -d -p 8787:8787 -e PASSWORD=yourpasswordhere rocker/rstudio:3.2.0

The command of running genunie Rstudio image does not specifiy a command

When running imported image

1 2	sudo docker run -d -ti --rm -p 8787:8787 --name RStudioServer --mount source=R_data,target=/home/rstudio/rdata rstudio:new docker: Error response from daemon: No command specified.

We need to go back to the original host to find the command:

1 2	sudo docker ps d79a11d5b7d0 rocker/rstudio:cancer_prediction "/init" 15 minutes ago Up 11 minutes 0.0.0.0:8787->8787/tcp RStudioServer

We can see the command is ‘/init’

1	sudo docker run -d -ti --rm -p 8787:8787 --name RStudioServer --mount source=R_data,target=/home/rstudio/rdata rstudio:fromTesla /init

Move a volume

There is no a magic command to move a volume.
My practise is compressing the whole directory and copying into the new container (mount the same entry point).

Numpy Vectorization using too much memory

Posted on 2021-02-23 Edited on 2022-05-07

Numpy vectorization

Vectorization is a technique to accelerate computing.
numpy vectorize v1.20
However, the data type of the output of vectorized is determined by calling the function with the first element of the input.

A thread in stackoverflow descripes this problem. Briefly, each value is evaluated as an object consuming much memory.
How to avoid enormous additional memory consumption when using numpy vectorize?

I encountered this problems when I applied this techinque. It consumes all my memory (128GB).
I tried to expand my swap to 256GB. The disk is not solid state, and thus the swap is pretty slow.